AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 145
A company uses an Amazon SageMaker AI ML model to make real-time inferences. The company has configured auto scaling for the Amazon EC2 instances that SageMaker AI uses for the inferences.
During times of peak usage, new instances launch before existing instances are fully ready. As a result, the model experiences inefficiencies and delays.
Which solution will optimize the scaling process without affecting response times?
Answer options
- A. Change to a multi-model endpoint configuration in SageMaker AI.
- B. Integrate Amazon API Gateway and AWS Lambda to manage invocations of the SageMaker AI inference endpoint.
- C. Decrease the cooldown period for scale-in activities. Increase the maximum number of instances.
- D. Increase the cooldown period after scale-out activities.
Correct answer: D
Explanation
Increasing the cooldown period after scale-out activities allows existing instances more time to become fully operational before new instances are added. This reduces the likelihood of having underprepared instances handling requests, thus improving overall efficiency. The other options either do not address the timing issue effectively or involve changes that could complicate the setup without directly solving the problem.