AWS Certified Machine Learning – Specialty — Question 305

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

Answer options

Correct answer: A

Explanation

Stratified sampling ensures that both the training and validation splits maintain the same class proportions as the original imbalanced dataset, which helps the model generalize better to minority classes during validation. Acquiring more majority class data would exacerbate the class imbalance issue rather than resolve it. Randomly downsizing the training set or using systematic sampling does not guarantee that minority classes will be adequately represented in both splits.