Google Cloud Professional Data Engineer — Question 227
Your organization's data assets are stored in BigQuery, Pub/Sub, and a PostgreSQL instance running on Compute Engine. Because there are multiple domains and diverse teams using the data, teams in your organization are unable to discover existing data assets. You need to design a solution to improve data discoverability while keeping development and configuration efforts to a minimum. What should you do?
Answer options
- A. Use Data Catalog to automatically catalog BigQuery datasets. Use Data Catalog APIs to manually catalog Pub/Sub topics and PostgreSQL tables.
- B. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.
- C. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use custom connectors to manually catalog PostgreSQL tables.
- D. Use customer connectors to manually catalog BigQuery datasets, Pub/Sub topics, and PostgreSQL tables.
Correct answer: B
Explanation
The correct option B is optimal because it allows for automatic cataloging of BigQuery datasets and Pub/Sub topics, streamlining the process while still enabling manual cataloging of PostgreSQL tables using Data Catalog APIs. Options A and C do not fully leverage the automatic capabilities of Data Catalog for both BigQuery and Pub/Sub, and option D would require more manual effort across all three data types, which contradicts the goal of minimizing development efforts.