AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 13
Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.
Which action will meet this requirement with the LEAST operational overhead?
Answer options
- A. Use AWS Glue to transform the categorical data into numerical data.
- B. Use AWS Glue to transform the numerical data into categorical data.
- C. Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.
- D. Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.
Correct answer: C
Explanation
The correct answer is C because Amazon SageMaker Data Wrangler provides tools specifically designed for data preparation tasks, making it efficient to transform categorical data into numerical data, which is essential for model training. Option A, while valid, involves AWS Glue, which may require more operational overhead compared to using SageMaker Data Wrangler. Options B and D are not suitable as they suggest the opposite transformations that do not align with the requirements for maximizing model accuracy.