A media company wants to deploy a machine learning (ML) model that uses Amazon SageMaker…

Question

A media company wants to deploy a machine learning (ML) model that uses Amazon SageMaker to recommend new articles to the company’s readers. The company's readers are primarily located in a single city. The company notices that the heaviest reader traffic predictably occurs early in the morning, after lunch, and again after work hours. There is very little traffic at other times of day. The media company needs to minimize the time required to deliver recommendations to its readers. The expected amount of data that the API call will return for inference is less than 4 MB. Which solution will meet these requirements in the MOST cost-effective way?

Accepted Answer

Correct answer: B. B. Serverless inference with provisioned concurrency — Amazon SageMaker Serverless inference is highly cost-effective for workloads with idle periods because billing is based on actual compute time rather than idle instances. By enabling provisioned concurrency, the company can eliminate cold-start latency during predictable peak hours to ensure fast recommendation delivery. Real-time endpoints would incur higher idle costs, while Asynchronous inference and batch transform tasks do not meet the low-latency requirement for real-time reader recommendations.

AWS Certified Machine Learning – Specialty — Question 359

Answer options

Correct answer: B

Explanation