Google Cloud Professional Machine Learning Engineer — Question 175
You work for a retail company. You have a managed tabular dataset in Vertex AI that contains sales data from three different stores. The dataset includes several features, such as store name and sale timestamp. You want to use the data to train a model that makes sales predictions for a new store that will open soon. You need to split the data between the training, validation, and test sets. What approach should you use to split the data?
Answer options
- A. Use Vertex AI manual split, using the store name feature to assign one store for each set
- B. Use Vertex AI default data split
- C. Use Vertex AI chronological split, and specify the sales timestamp feature as the time variable
- D. Use Vertex AI random split, assigning 70% of the rows to the training set, 10% to the validation set, and 20% to the test set
Correct answer: C
Explanation
The correct answer is C because using a chronological split allows the model to learn from past sales data based on the sales timestamp, which is crucial for making accurate predictions for future sales. Options A and D do not take into account the temporal aspect of the data, which can lead to less reliable predictions, while option B does not provide a tailored approach for the specific dataset being used.