AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 134
An ML engineer needs to deploy a trained model that is based on a genetic algorithm. The algorithm solves a complex problem and can take several minutes to generate predictions.
When the model is deployed, the model needs to access large amounts of data to process requests. The requests can involve as much as 100 MB of data.
Which deployment solution will meet these requirements with the LEAST operational overhead?
Answer options
- A. Deploy the model to Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer.
- B. Deploy the model to an Amazon SageMaker real-time endpoint.
- C. Deploy the model to an Amazon SageMaker Asynchronous Inference endpoint.
- D. Package the model as a container. Deploy the model to Amazon Elastic Container Service (Amazon ECS) on Amazon EC2 instances.
Correct answer: C
Explanation
The correct answer is C because an Amazon SageMaker Asynchronous Inference endpoint is designed for handling large payloads and can process requests that take longer to complete without tying up resources, making it ideal for the model's requirements. Options A and B do not offer the same level of suitability for processing large amounts of data with longer inference times, while option D introduces additional complexity with container management.