Google Cloud Professional Data Engineer — Question 42
You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?
Answer options
- A. Create a table in BigQuery, and append the new samples for CPU and memory to the table
- B. Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second
- C. Create a narrow table in Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
- D. Create a wide table in Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.
Correct answer: C
Explanation
Option C is the best choice as it uses a narrow table in Bigtable, which is optimized for high write throughput and can efficiently handle time series data with unique row keys for each second. The other options either involve a design that could lead to increased costs or inefficiencies in querying and storage, particularly with the wide table approaches that can complicate schema evolution and increase costs per query.