AWS Certified Machine Learning – Specialty — Question 297
A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.
Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions. The ML specialist must ensure that the data does not contain outliers before training the model.
How can the ML specialist meet these requirements with the LEAST operational overhead?
Answer options
- A. Load the data into an Amazon SageMaker Studio notebook. Calculate the first and third quartile. Use a SageMaker Data Wrangler data flow to remove only values that are outside of those quartiles.
- B. Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset. Use a Data Wrangler data flow to remove outliers based on the bias report.
- C. Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.
- D. Use Amazon Lookout for Equipment to find and remove outliers from the dataset.
Correct answer: C
Explanation
Amazon SageMaker Data Wrangler provides built-in anomaly detection visualizations that allow ML specialists to easily identify outliers in their datasets without writing custom code. Once identified, users can quickly apply built-in transformations within the Data Wrangler data flow to remove these anomalies, minimizing operational overhead. Other options either introduce unnecessary manual coding (Option A), misapply bias reports which are meant for measuring bias rather than detecting outliers (Option B), or involve Amazon Lookout for Equipment which is a managed service for predictive maintenance but does not serve as a general-purpose tool for preparing and cleaning data flows in this manner (Option D).