AWS Certified Machine Learning – Specialty — Question 360

A global bank requires a solution to predict whether customers will leave the bank and choose another bank. The bank is using a dataset to train a model to predict customer loss. The training dataset has 1,000 rows. The training dataset includes 100 instances of customers who left the bank.

A machine learning (ML) specialist is using Amazon SageMaker Data Wrangler to train a churn prediction model by using a SageMaker training job. After training, the ML specialist notices that the model returns only false results. The ML specialist must correct the model so that it returns more accurate predictions.

Which solution will meet these requirements?

Answer options

Correct answer: B

Explanation

The training dataset is highly imbalanced, with only 10% of the records representing the positive class (churn), causing the model to lean heavily toward predicting the majority class. Applying Synthetic Minority Oversampling Technique (SMOTE) addresses this by synthetically generating new instances of the minority class to balance the dataset without losing information. Undersampling is inappropriate because reducing the already small dataset of 1,000 rows would leave too little data for effective model training, while normalization and anomaly detection do not resolve class imbalance.