AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 12
Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.
Which solution will meet these requirements?
Answer options
- A. Use Amazon Athena to automatically detect the anomalies and to visualize the result.
- B. Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.
- C. Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.
- D. Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.
Correct answer: C
Explanation
The correct answer is C because Amazon SageMaker Data Wrangler is specifically designed for data preparation and analysis in machine learning workflows, enabling automatic anomaly detection and visualization capabilities. The other options either do not provide the necessary anomaly detection features or are not as integrated for the machine learning process as SageMaker Data Wrangler.