AWS Certified Machine Learning – Specialty — Question 318
A machine learning (ML) specialist at a manufacturing company uses Amazon SageMaker DeepAR to forecast input materials and energy requirements for the company. Most of the data in the training dataset is missing values for the target variable. The company stores the training dataset as JSON files.
The ML specialist develop a solution by using Amazon SageMaker DeepAR to account for the missing values in the training dataset.
Which approach will meet these requirements with the LEAST development effort?
Answer options
- A. Impute the missing values by using the linear regression method. Use the entire dataset and the imputed values to train the DeepAR model.
- B. Replace the missing values with not a number (NaN). Use the entire dataset and the encoded missing values to train the DeepAR model.
- C. Impute the missing values by using a forward fill. Use the entire dataset and the imputed values to train the DeepAR model.
- D. Impute the missing values by using the mean value. Use the entire dataset and the imputed values to train the DeepAR model.
Correct answer: B
Explanation
Amazon SageMaker DeepAR natively supports missing values in the target time series if they are encoded as "NaN" (not a number) in the JSON input. This built-in capability allows the model to handle missing data automatically without requiring manual imputation. Options A, C, and D are incorrect because they require unnecessary development effort to implement custom data imputation preprocessing steps.