AWS Certified Machine Learning – Specialty — Question 304
An online delivery company wants to choose the fastest courier for each delivery at the moment an order is placed. The company wants to implement this feature for existing users and new users of its application. Data scientists have trained separate models with XGBoost for this purpose, and the models are stored in Amazon S3. There is one model for each city where the company operates.
Operation engineers are hosting these models in Amazon EC2 for responding to the web client requests, with one instance for each model, but the instances have only a 5% utilization in CPU and memory. The operation engineers want to avoid managing unnecessary resources.
Which solution will enable the company to achieve its goal with the LEAST operational overhead?
Answer options
- A. Create an Amazon SageMaker notebook instance for pulling all the models from Amazon S3 using the boto3 library. Remove the existing instances and use the notebook to perform a SageMaker batch transform for performing inferences offline for all the possible users in all the cities. Store the results in different files in Amazon S3. Point the web client to the files.
- B. Prepare an Amazon SageMaker Docker container based on the open-source multi-model server. Remove the existing instances and create a multi-model endpoint in SageMaker instead, pointing to the S3 bucket containing all the models. Invoke the endpoint from the web client at runtime, specifying the TargetModel parameter according to the city of each request.
- C. Keep only a single EC2 instance for hosting all the models. Install a model server in the instance and load each model by pulling it from Amazon S3. Integrate the instance with the web client using Amazon API Gateway for responding to the requests in real time, specifying the target resource according to the city of each request.
- D. Prepare a Docker container based on the prebuilt images in Amazon SageMaker. Replace the existing instances with separate SageMaker endpoints, one for each city where the company operates. Invoke the endpoints from the web client, specifying the URL and EndpointName parameter according to the city of each request.
Correct answer: B
Explanation
Amazon SageMaker multi-model endpoints (MME) are designed to host multiple models on a single endpoint, dynamically loading them from Amazon S3 as needed, which significantly reduces operational overhead and costs for underutilized models. Option B is correct because it leverages MME to host all city-specific XGBoost models on a single endpoint, routing requests using the TargetModel parameter. Option A is incorrect because batch transform is for offline inference and cannot serve real-time courier selection. Option C increases operational overhead by requiring manual management of an EC2 instance and model server. Option D is inefficient and costly as it maintains separate endpoints for each city, failing to address the low utilization and resource management issues.