Google Cloud Professional Data Engineer — Question 116
You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?
Answer options
- A. Store and process the entire dataset in BigQuery.
- B. Store and process the entire dataset in Bigtable.
- C. Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
- D. Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.
Correct answer: C
Explanation
The correct answer is C because it allows for efficient processing and analytics on the full dataset using BigQuery while providing a compressed version in Cloud Storage for compatibility with batch analysis tools. Answer A is incorrect as it does not provide a compressed copy for external access, B is wrong since Bigtable is not designed for data warehouse-style analytics, and D is not optimal because it separates data incorrectly without addressing the need for batch analysis capabilities.