AWS Certified Machine Learning – Specialty — Question 319

An ecommerce company wants to update a production real-time machine learning (ML) recommendation engine API that uses Amazon SageMaker. The company wants to release a new model but does not want to make changes to applications that rely on the API. The company also wants to evaluate the performance of the new model in production traffic before the company fully rolls out the new model to all users.

Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: B

Explanation

Using Amazon SageMaker production variants allows you to host multiple models on a single endpoint and distribute traffic among them by configuring weights, which requires no changes to client applications and minimizes operational overhead. Options A and D introduce unnecessary complexity by requiring additional load balancers (ALB/NLB) and managing multiple endpoints. Option C is incorrect because SageMaker batch transform is designed for non-real-time, offline predictions on large datasets rather than real-time API traffic splitting.