AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 128
A company wants to use Amazon SageMaker to host an ML model that runs on CPU for real-time predictions. The model will have intermittent traffic during business hours and will have periods of no traffic after business hours. The company needs a solution that will serve inference requests in the most cost-effective manner.
Which hosting option will meet these requirements?
Answer options
- A. Deploy the model to a SageMaker real-time endpoint. Add a schedule-based auto scaling policy to handle traffic surges during business hours.
- B. Deploy the model to a SageMaker Serverless Inference endpoint. Configure increased provisioned concurrency during business hours.
- C. Deploy the model to a SageMaker Asynchronous Inference endpoint. Configure an auto scaling policy that scales in to zero outside business hours.
- D. Deploy the model to a SageMaker real-time endpoint. Create a scheduled AWS Lambda function that activates the endpoint during business hours only.
Correct answer: B
Explanation
The correct answer is B because deploying the model to a SageMaker Serverless Inference endpoint allows for automatic scaling based on traffic, making it cost-effective during periods of low usage. Option A, while it can handle traffic, does not scale down efficiently after hours. Option C involves asynchronous inference, which is not suitable for real-time predictions. Option D adds unnecessary complexity with a scheduled Lambda function and does not provide the same level of cost-effectiveness as option B.