AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 55
An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.
Which solution will meet these requirements?
Answer options
- A. Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.
- B. Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.
- C. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.
- D. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.
Correct answer: A
Explanation
The correct answer is A because Amazon SageMaker Data Wrangler is specifically designed to handle data preparation tasks, including cleansing and enrichment, which is essential for addressing missing values, duplicates, and outliers. The other options do not provide the necessary functionalities for data cleansing and preparation, making them less suitable for the task at hand.