AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 112

An ML engineer is training an ML model to identify people’s health risk based on 20 features and 1 target. The target class has two values:

• Likely to have health risk (positive class)
• Unlikely to have health risk (negative class)

The age range of people in the dataset is 30 years old to 60 years old. Age is one of the features.

The ML engineer analyzes the features. For the positive class, the difference in proportions of labels (DPL) value is (+0.9) for the age range of 40 to 45 compared with all other age ranges.

What should the ML engineer do to correct this data imbalance?

Answer options

Correct answer: B

Explanation

The correct answer is B, as undersampling the positive class for the age range of 40 to 45 helps to balance the dataset by reducing the overrepresentation of this group. Oversampling or undersampling actions in options A, C, and D would either exacerbate the imbalance or incorrectly adjust the class distributions, leading to potential bias in the model.