AWS Certified Machine Learning – Specialty — Question 360
A global bank requires a solution to predict whether customers will leave the bank and choose another bank. The bank is using a dataset to train a model to predict customer loss. The training dataset has 1,000 rows. The training dataset includes 100 instances of customers who left the bank.
A machine learning (ML) specialist is using Amazon SageMaker Data Wrangler to train a churn prediction model by using a SageMaker training job. After training, the ML specialist notices that the model returns only false results. The ML specialist must correct the model so that it returns more accurate predictions.
Which solution will meet these requirements?
Answer options
- A. Apply anomaly detection to remove outliers from the training dataset before training.
- B. Apply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.
- C. Apply normalization to the features of the training dataset before training.
- D. Apply undersampling to the training dataset before training.
Correct answer: B
Explanation
The training dataset is highly imbalanced, with only 10% of the records representing the positive class (churn), causing the model to lean heavily toward predicting the majority class. Applying Synthetic Minority Oversampling Technique (SMOTE) addresses this by synthetically generating new instances of the minority class to balance the dataset without losing information. Undersampling is inappropriate because reducing the already small dataset of 1,000 rows would leave too little data for effective model training, while normalization and anomaly detection do not resolve class imbalance.