Google Cloud Professional Data Engineer — Question 203
You have a network of 1000 sensors. The sensors generate time series data: one metric per sensor per second, along with a timestamp. You already have 1 TB of data, and expect the data to grow by 1 GB every day. You need to access this data in two ways. The first access pattern requires retrieving the metric from one specific sensor stored at a specific timestamp, with a median single-digit millisecond latency. The second access pattern requires running complex analytic queries on the data, including joins, once a day. How should you store this data?
Answer options
- A. Store your data in BigQuery. Concatenate the sensor ID and timestamp, and use it as the primary key.
- B. Store your data in Bigtable. Concatenate the sensor ID and timestamp and use it as the row key. Perform an export to BigQuery every day.
- C. Store your data in Bigtable. Concatenate the sensor ID and metric, and use it as the row key. Perform an export to BigQuery every day.
- D. Store your data in BigQuery. Use the metric as a primary key.
Correct answer: B
Explanation
Option B is the correct choice because Bigtable is optimized for low-latency read operations, which is essential for retrieving metrics at specific timestamps. Additionally, exporting data to BigQuery enables complex analytics and joins once a day. The other options either do not meet the latency requirements or do not effectively utilize the capabilities of Bigtable for the specified use cases.