Google Cloud Professional Data Engineer — Question 114

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

Answer options

Correct answer: B

Explanation

Option B is correct because organizing data into separate monthly tables allows for easier management and targeted recovery, while exporting and compressing the data optimizes storage costs in Cloud Storage. Option A suggests using a single table, which complicates recovery and management. Option C involves duplicating data in BigQuery, which could increase costs and complicate the backup process. Option D, while it allows for recovery, may not be as cost-effective as exporting and compressing data for long-term storage.