AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 197
An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.
The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.
Which solution will meet this requirement?
Answer options
- A. Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances.
- B. Change the target metric to CPU utilization.
- C. Modify the scaling policy target value to one.
- D. Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.
Correct answer: D
Explanation
Option D is correct because it ensures that at least one instance is available at the start of each business day, preventing delays. The other options either modify scaling parameters without addressing the zero instances issue or focus on different metrics that do not directly resolve the problem of initial capacity.