Case study - An ML engineer is developing a fraud detection model on AWS. The training da…

Question

Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.
Which action will meet this requirement with the LEAST operational overhead?

Accepted Answer

Correct answer: C. C. Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data. — The correct answer is C because Amazon SageMaker Data Wrangler provides tools specifically designed for data preparation tasks, making it efficient to transform categorical data into numerical data, which is essential for model training. Option A, while valid, involves AWS Glue, which may require more operational overhead compared to using SageMaker Data Wrangler. Options B and D are not suitable as they suggest the opposite transformations that do not align with the requirements for maximizing model accuracy.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 13

Answer options

Correct answer: C

Explanation