AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 14

Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.
Which solution will meet this requirement with the LEAST operational effort?

Answer options

Correct answer: D

Explanation

The correct answer is D because Amazon SageMaker Data Wrangler provides a straightforward method to balance the dataset with minimal effort. Options A and B involve more complex processes that require additional steps, while option C, while helpful, does not offer the same level of integration and simplicity as using Data Wrangler.