A company has trained an ML model in Amazon SageMaker. The company needs to host the mode…

Question

A company has trained an ML model in Amazon SageMaker. The company needs to host the model to provide inferences in a production environment.
The model must be highly available and must respond with minimum latency. The size of each request will be between 1 KB and 3 MB. The model will receive unpredictable bursts of requests during the day. The inferences must adapt proportionally to the changes in demand.
How should the company deploy the model into production to meet these requirements?

Accepted Answer

Correct answer: A. A. Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model. — The correct answer is A because a SageMaker real-time inference endpoint is designed for low-latency, high-availability inferences and can automatically scale to handle varying request loads. Option B is less suitable as ECS scheduled scaling based on CPU may not respond quickly enough to unpredictable demand. Option C, while using EKS, does not leverage SageMaker's built-in optimizations for real-time inference. Option D involves using Spot Instances, which may lead to availability issues during high demand periods.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 35

Answer options

Correct answer: A

Explanation