AWS Certified Machine Learning – Specialty — Question 273

An exercise analytics company wants to predict running speeds for its customers by using a dataset that contains multiple health-related features for each customer. Some of the features originate from sensors that provide extremely noisy values.

The company is training a regression model by using the built-in Amazon SageMaker linear learner algorithm to predict the running speeds. While the company is training the model, a data scientist observes that the training loss decreases to almost zero, but validation loss increases.

Which technique should the data scientist use to optimally fit the model?

Answer options

Correct answer: D

Explanation

The scenario describes a classic case of overfitting, where the model learns the noise in the training data too well, causing validation loss to increase. L2 regularization (weight decay) prevents overfitting by penalizing the square of the weights, which prevents any single noisy feature from dominating the model's predictions. In contrast, adding polynomial terms (Option C) would increase model complexity and worsen overfitting, while L1 regularization (Option A) is primarily used for feature selection rather than smoothly handling noisy inputs.