Google Cloud Professional Data Engineer — Question 258
You are designing a data lake on Google Cloud to store vast amounts of customer interaction data from various sources, such as websites, mobile apps, and social media. You need to ensure that this data, which arrives in different formats, is consistently cataloged and easy for data analysts to discover and use. You also want to perform basic data quality checks and transformations before the data is consumed by downstream applications. You need an automated and managed data governance solution. What should you do?
Answer options
- A. Use Cloud Storage as the central repository. Use Vertex AI to classify and process the data and perform data quality checks.
- B. Stream all the data directly into BigQuery, where it is automatically cataloged and governed.
- C. Use Cloud Storage and BigQuery as repositories. Use Dataplex Universal Catalog for metadata discovery, data quality checks, and transformations.
- D. Use Cloud Storage as the central repository. Use a Cloud Run function to catalog, transform the data, and perform data quality checks.
Correct answer: C
Explanation
The correct answer is C because it combines Cloud Storage and BigQuery while utilizing Dataplex Universal Catalog, which is specifically designed for metadata management, data quality checks, and transformations. Option A does not provide a comprehensive governance solution, and while B offers automatic cataloging, it lacks the necessary transformations and quality checks. Option D relies on a Cloud Run function, which may not be as efficient or comprehensive as Dataplex for governance tasks.