Google Cloud Professional Data Engineer — Question 266
You are building a report-only data warehouse where the data is streamed into BigQuery via the streaming API. Following Google's best practices, you have both a staging and a production table for the data. How should you design your data loading to ensure that there is only one master dataset without affecting performance on either the ingestion or reporting pieces?
Answer options
- A. Have a staging table that is an append-only model, and then update the production table every three hours with the changes written to staging.
- B. Have a staging table that is an append-only model, and then update the production table every ninety minutes with the changes written to staging.
- C. Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every three hours.
- D. Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every thirty minutes.
Correct answer: C
Explanation
Option C is correct because it allows the staging table to efficiently manage data before moving it to production, ensuring that the production table contains the most recent data without impacting performance. Options A and B introduce unnecessary delays in updating the production table, which can hinder timely reporting. Option D, while more frequent than C, may lead to excessive overhead and performance issues during data transfer.