AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 48

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.
The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage.
Which solution will meet these requirements?

Answer options

Correct answer: D

Explanation

Option D is correct because it allows for dynamic scaling of instances based on actual demand, ensuring cost efficiency during low usage periods while maintaining responsiveness during peak requests. Option A is incorrect as fixed concurrency does not adapt to varying request volumes. Option B lacks the scalability needed, as it maintains a static number of tasks, and Option C does not provide auto scaling, which is crucial for handling fluctuating request rates.