AWS Certified Machine Learning – Specialty — Question 188
A company is building a machine learning (ML) model to classify images of plants. An ML specialist has trained the model using the Amazon SageMaker built-in Image Classification algorithm. The model is hosted using a SageMaker endpoint on an ml.m5.xlarge instance for real-time inference. When used by researchers in the field, the inference has greater latency than is acceptable. The latency gets worse when multiple researchers perform inference at the same time on their devices. Using Amazon CloudWatch metrics, the ML specialist notices that the ModelLatency metric shows a high value and is responsible for most of the response latency.
The ML specialist needs to fix the performance issue so that researchers can experience less latency when performing inference from their devices.
Which action should the ML specialist take to meet this requirement?
Answer options
- A. Change the endpoint instance to an ml.t3 burstable instance with the same vCPU number as the ml.m5.xlarge instance has.
- B. Attach an Amazon Elastic Inference ml.eia2.medium accelerator to the endpoint instance.
- C. Enable Amazon SageMaker Autopilot to automatically tune performance of the model.
- D. Change the endpoint instance to use a memory optimized ML instance.
Correct answer: B
Explanation
The correct answer is B because attaching an Amazon Elastic Inference accelerator can enhance the performance of the model by providing additional GPU resources specifically for inference tasks, which helps reduce latency. Options A and D may not provide the necessary performance boost needed for high concurrency, while C does not directly address the immediate latency issue since Autopilot is more oriented towards model training and tuning rather than real-time inference performance.