AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 199
An airline company uses an ML model to adjust ticket prices based on demand. The model runs on Amazon SageMaker real-time endpoints. During previous deployments, the model failed to scale quickly enough when website traffic increased, which caused delays in price adjustments.
An ML engineer needs to configure auto scaling for the SageMaker endpoints to respond rapidly to traffic changes. The solution must use target tracking scaling policies.
Which configuration will be MOST responsive to sudden changes in traffic?
Answer options
- A. Configure auto scaling based on the SageMaker AI InvocationsPerInstance standard metric. Configure 10-second interval resolution, and set the default 300-second scale-in cooldown period.
- B. Configure auto scaling based on the SageMaker AI InvocationsPerInstance metric. Configure high-resolution 10-second intervals, and set a 600-second scale-in cooldown period.
- C. Configure auto scaling based on the SageMaker InvocationsPerInstance standard metric. Configure 10-second intervals resolution, and set a 600-second scale-in cooldown period.
- D. Configure auto scaling based on the SageMaker InvocationsPerInstance metric. Configure high-resolution 10-second intervals, and set the default 300-second scale-in cooldown period.
Correct answer: D
Explanation
Option D is correct because it uses the SageMaker InvocationsPerInstance metric, which allows for a more responsive scaling reaction, and the high-resolution 10-second intervals facilitate quicker adjustments. The 300-second scale-in cooldown is appropriate, allowing for immediate scaling without excessive delays. Other options either use a standard metric or longer cooldown periods, which would hinder responsiveness during sudden traffic increases.