AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 128

A company wants to use Amazon SageMaker to host an ML model that runs on CPU for real-time predictions. The model will have intermittent traffic during business hours and will have periods of no traffic after business hours. The company needs a solution that will serve inference requests in the most cost-effective manner.

Which hosting option will meet these requirements?

Answer options

Correct answer: B

Explanation

The correct answer is B because deploying the model to a SageMaker Serverless Inference endpoint allows for automatic scaling based on traffic, making it cost-effective during periods of low usage. Option A, while it can handle traffic, does not scale down efficiently after hours. Option C involves asynchronous inference, which is not suitable for real-time predictions. Option D adds unnecessary complexity with a scheduled Lambda function and does not provide the same level of cost-effectiveness as option B.