Google Cloud Professional Data Engineer — Question 234
You work for an airline and you need to store weather data in a BigQuery table. Weather data will be used as input to a machine learning model. The model only uses the last 30 days of weather data. You want to avoid storing unnecessary data and minimize costs. What should you do?
Answer options
- A. Create a BigQuery table where each record has an ingestion timestamp. Run a scheduled query to delete all the rows with an ingestion timestamp older than 30 days.
- B. Create a BigQuery table partitioned by datetime value of the weather date. Set up partition expiration to 30 days.
- C. Create a BigQuery table partitioned by ingestion time. Set up partition expiration to 30 days.
- D. Create a BigQuery table with a datetime column for the day the weather data refers to. Run a scheduled query to delete rows with a datetime value older than 30 days.
Correct answer: B
Explanation
Option B is correct because partitioning the table by the datetime value of the weather date allows automatic management of data retention through partition expiration, thereby saving costs and ensuring only the last 30 days of data are kept. Options A and D require manual intervention to delete old data, which is less efficient, and option C partitions by ingestion time rather than the relevant weather date, which does not align with the requirement of using the last 30 days of weather data.