AWS Certified Machine Learning – Specialty — Question 304

An online delivery company wants to choose the fastest courier for each delivery at the moment an order is placed. The company wants to implement this feature for existing users and new users of its application. Data scientists have trained separate models with XGBoost for this purpose, and the models are stored in Amazon S3. There is one model for each city where the company operates.

Operation engineers are hosting these models in Amazon EC2 for responding to the web client requests, with one instance for each model, but the instances have only a 5% utilization in CPU and memory. The operation engineers want to avoid managing unnecessary resources.

Which solution will enable the company to achieve its goal with the LEAST operational overhead?

Answer options

Correct answer: B

Explanation

Amazon SageMaker multi-model endpoints (MME) are designed to host multiple models on a single endpoint, dynamically loading them from Amazon S3 as needed, which significantly reduces operational overhead and costs for underutilized models. Option B is correct because it leverages MME to host all city-specific XGBoost models on a single endpoint, routing requests using the TargetModel parameter. Option A is incorrect because batch transform is for offline inference and cannot serve real-time courier selection. Option C increases operational overhead by requiring manual management of an EC2 instance and model server. Option D is inefficient and costly as it maintains separate endpoints for each city, failing to address the low utilization and resource management issues.