You use BigQuery as your centralized analytics platform. New data is loaded every day, an…

Question

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

Accepted Answer

Correct answer: B. B. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage. — Option B is correct because organizing data into separate monthly tables allows for easier management and targeted recovery, while exporting and compressing the data optimizes storage costs in Cloud Storage. Option A suggests using a single table, which complicates recovery and management. Option C involves duplicating data in BigQuery, which could increase costs and complicate the backup process. Option D, while it allows for recovery, may not be as cost-effective as exporting and compressing data for long-term storage.

Google Cloud Professional Data Engineer — Question 114

Answer options

Correct answer: B

Explanation