An ML engineer is configuring auto scaling for an inference component of a model that run…

Question

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing. The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day. Which solution will meet this requirement?

Accepted Answer

Correct answer: D. D. Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day. — Option D is correct because it ensures that at least one instance is available at the start of each business day, preventing delays. The other options either modify scaling parameters without addressing the zero instances issue or focus on different metrics that do not directly resolve the problem of initial capacity.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 197

Answer options

Correct answer: D

Explanation