Google Cloud Professional Data Engineer — Question 206
You are part of a healthcare organization where data is organized and managed by respective data owners in various storage services. As a result of this decentralized ecosystem, discovering and managing data has become difficult. You need to quickly identify and implement a cost-optimized solution to assist your organization with the following:
• Data management and discovery
• Data lineage tracking
• Data quality validation
How should you build the solution?
Answer options
- A. Use BigLake to convert the current solution into a data lake architecture.
- B. Build a new data discovery tool on Google Kubernetes Engine that helps with new source onboarding and data lineage tracking.
- C. Use BigQuery to track data lineage, and use Dataprep to manage data and perform data quality validation.
- D. Use Dataplex to manage data, track data lineage, and perform data quality validation.
Correct answer: D
Explanation
The correct choice is D because Dataplex is specifically designed for managing data, tracking lineage, and validating data quality, making it the most suitable solution for the organization's needs. Options A and B do not provide a comprehensive approach to all three requirements, while option C, although useful, does not encompass management and discovery as effectively as Dataplex.