A company is building a machine learning (ML) model to classify images of plants. An ML s…

Question

A company is building a machine learning (ML) model to classify images of plants. An ML specialist has trained the model using the Amazon SageMaker built-in Image Classification algorithm. The model is hosted using a SageMaker endpoint on an ml.m5.xlarge instance for real-time inference. When used by researchers in the field, the inference has greater latency than is acceptable. The latency gets worse when multiple researchers perform inference at the same time on their devices. Using Amazon CloudWatch metrics, the ML specialist notices that the ModelLatency metric shows a high value and is responsible for most of the response latency. The ML specialist needs to fix the performance issue so that researchers can experience less latency when performing inference from their devices. Which action should the ML specialist take to meet this requirement?

Accepted Answer

Correct answer: B. B. Attach an Amazon Elastic Inference ml.eia2.medium accelerator to the endpoint instance. — The correct answer is B because attaching an Amazon Elastic Inference accelerator can enhance the performance of the model by providing additional GPU resources specifically for inference tasks, which helps reduce latency. Options A and D may not provide the necessary performance boost needed for high concurrency, while C does not directly address the immediate latency issue since Autopilot is more oriented towards model training and tuning rather than real-time inference performance.

AWS Certified Machine Learning – Specialty — Question 188

Answer options

Correct answer: B

Explanation