Google Cloud Professional Data Engineer — Question 219
You have a table that contains millions of rows of sales data, partitioned by date. Various applications and users query this data many times a minute. The query requires aggregating values by using AVG, MAX, and SUM, and does not require joining to other tables. The required aggregations are only computed over the past year of data, though you need to retain full historical data in the base tables. You want to ensure that the query results always include the latest data from the tables, while also reducing computation cost, maintenance overhead, and duration. What should you do?
Answer options
- A. Create a materialized view to aggregate the base table data. Include a filter clause to specify the last one year of partitions.
- B. Create a materialized view to aggregate the base table data. Configure a partition expiration on the base table to retain only the last one year of partitions.
- C. Create a view to aggregate the base table data. Include a filter clause to specify the last year of partitions.
- D. Create a new table that aggregates the base table data. Include a filter clause to specify the last year of partitions. Set up a scheduled query to recreate the new table every hour.
Correct answer: A
Explanation
Option A is correct because creating a materialized view allows for efficient aggregation of data while ensuring that the latest information is always included, specifically targeting the last year of partitions. Option B is incorrect as it suggests removing older data from the base table, which contradicts the requirement to retain full historical data. Option C is not suitable because a regular view does not store aggregated data, leading to higher computation costs during each query. Option D involves unnecessary complexity and overhead by creating a new table and scheduling regular updates, which is less efficient than using a materialized view.