Google Cloud Professional Data Engineer — Question 319
Your company is in the process of migrating its on-premises data warehousing solutions to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to apply updates from multiple transactional database sources on a daily basis. With BigQuery, your company hopes to improve its handling of
CDC so that changes to the source systems are available to query in BigQuery in near-real time using log-based CDC streams, while also optimizing for the performance of applying changes to the data warehouse. Which two steps should they take to ensure that changes are available in the BigQuery reporting table with minimal latency while reducing compute overhead? (Choose two.)
Answer options
- A. Perform a DML INSERT, UPDATE, or DELETE to replicate each individual CDC record in real time directly on the reporting table.
- B. Insert each new CDC record and corresponding operation type to a staging table in real time.
- C. Periodically DELETE outdated records from the reporting table.
- D. Periodically use a DML MERGE to perform several DML INSERT, UPDATE, and DELETE operations at the same time on the reporting table.
- E. Insert each new CDC record and corresponding operation type in real time to the reporting table, and use a materialized view to expose only the newest version of each unique record.
Correct answer: B, D
Explanation
Option B is correct because inserting CDC records into a staging table allows for real-time processing while minimizing direct updates to the reporting table. Option D is also correct as using DML MERGE consolidates multiple operations into one, which reduces compute overhead. Options A and E are less efficient, as they involve more frequent direct updates to the reporting table or do not optimize the process effectively. Option C does not address the real-time requirement and focuses only on deleting old records.