AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 119
A company is developing a new online application to gather information from customers. An ML engineer has developed a new ML model that will determine a score for each customer. The model will use the score to determine which product to display to the customer. The ML engineer needs to minimize response-time latency for the model.
How should the ML engineer deploy the application in Amazon SageMaker to meet these requirements?
Answer options
- A. Configure batch transform.
- B. Configure a real-time inference endpoint.
- C. Configure a serverless inference endpoint.
- D. Configure an asynchronous inference endpoint.
Correct answer: B
Explanation
The correct choice is B, as a real-time inference endpoint in Amazon SageMaker is designed to provide low-latency responses, making it suitable for applications requiring immediate scoring. Options A and D do not support real-time requests, and C, while potentially useful for some applications, does not guarantee the lowest latency compared to a real-time endpoint.