Google Cloud Professional Data Engineer — Question 228

You are monitoring your organization’s data lake hosted on BigQuery. The ingestion pipelines read data from Pub/Sub and write the data into tables on BigQuery. After a new version of the ingestion pipelines is deployed, the daily stored data increased by 50%. The volumes of data in Pub/Sub remained the same and only some tables had their daily partition data size doubled. You need to investigate and fix the cause of the data increase. What should you do?

Answer options

Correct answer: C

Explanation

Option C is correct because it addresses the investigation of duplicate rows and utilizes audit logs and monitoring tools to track job versions, which helps identify the source of increased data. The other options either focus on deduplication without investigating the root cause (A), address only code errors without a comprehensive approach (B), or involve rolling back changes without understanding the issue (D).