Google Cloud Professional Data Engineer — Question 167
You migrated a data backend for an application that serves 10 PB of historical product data for analytics. Only the last known state for a product, which is about 10 GB of data, needs to be served through an API to the other applications. You need to choose a cost-effective persistent storage solution that can accommodate the analytics requirements and the API performance of up to 1000 queries per second (QPS) with less than 1 second latency. What should you do?
Answer options
- A. 1. Store the historical data in BigQuery for analytics. 2. Use a materialized view to precompute the last state of a product. 3. Serve the last state data directly from BigQuery to the API.
- B. 1. Store the products as a collection in Firestore with each product having a set of historical changes. 2. Use simple and compound queries for analytics. 3. Serve the last state data directly from Firestore to the API.
- C. 1. Store the historical data in Cloud SQL for analytics. 2. In a separate table, store the last state of the product after every product change. 3. Serve the last state data directly from Cloud SQL to the API.
- D. 1. Store the historical data in BigQuery for analytics. 2. In a Cloud SQL table, store the last state of the product after every product change. 3. Serve the last state data directly from Cloud SQL to the API.
Correct answer: D
Explanation
Option D is the correct choice because it leverages BigQuery for analytics, which is optimized for large datasets, while maintaining the last state in Cloud SQL, ensuring low-latency access for the API. Other options either compromise on performance, such as using Firestore which may not meet the required QPS and latency, or do not separate analytics and current state storage effectively.