Google Cloud Professional Data Engineer — Question 114
You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?
Answer options
- A. Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
- B. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
- C. Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
- D. Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.
Correct answer: B
Explanation
Option B is correct because organizing data into separate monthly tables allows for easier management and targeted recovery, while exporting and compressing the data optimizes storage costs in Cloud Storage. Option A suggests using a single table, which complicates recovery and management. Option C involves duplicating data in BigQuery, which could increase costs and complicate the backup process. Option D, while it allows for recovery, may not be as cost-effective as exporting and compressing data for long-term storage.