AWS Certified Machine Learning – Specialty — Question 211

A finance company needs to forecast the price of a commodity. The company has compiled a dataset of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset and must validate the efficacy of those models on the remaining 20% of the dataset.

How should the data scientist split the dataset into a training dataset and a validation dataset to compare model performance?

Answer options

Correct answer: A

Explanation

The correct answer, A, involves selecting a date that ensures 80% of the historical data precedes it, which is crucial for time series forecasting to maintain temporal integrity. Option B incorrectly suggests that data points after the date should be used for training, breaking the chronological order. Options C and D do not respect the temporal aspect of the data, making them unsuitable for forecasting tasks.