Google Cloud Professional Data Engineer — Question 328
You are migrating a table to BigQuery and are deciding on the data model. Your table stores information related to purchases made across several store locations and includes information like the time of the transaction, items purchased, the store ID, and the city and state in which the store is located. You frequently query this table to see how many of each item were sold over the past 30 days and to look at purchasing trends by state, city, and individual store. How would you model this table for the best query performance?
Answer options
- A. Partition by transaction time; cluster by state first, then city, then store ID.
- B. Partition by transaction time; cluster by store ID first, then city, then state.
- C. Top-level cluster by state first, then city, then store ID.
- D. Top-level cluster by store ID first, then city, then state.
Correct answer: A
Explanation
The correct answer is A because partitioning by transaction time allows for efficient querying of recent data, while clustering by state, then city, and then store ID optimizes the performance for the specific queries involving geographic locations and trends. Options B and C do not prioritize state in the clustering, which is crucial for the types of queries mentioned, and D lacks the partitioning benefit for time-based queries.