AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 199

An airline company uses an ML model to adjust ticket prices based on demand. The model runs on Amazon SageMaker real-time endpoints. During previous deployments, the model failed to scale quickly enough when website traffic increased, which caused delays in price adjustments.

An ML engineer needs to configure auto scaling for the SageMaker endpoints to respond rapidly to traffic changes. The solution must use target tracking scaling policies.

Which configuration will be MOST responsive to sudden changes in traffic?

Answer options

Correct answer: D

Explanation

Option D is correct because it uses the SageMaker InvocationsPerInstance metric, which allows for a more responsive scaling reaction, and the high-resolution 10-second intervals facilitate quicker adjustments. The 300-second scale-in cooldown is appropriate, allowing for immediate scaling without excessive delays. Other options either use a standard metric or longer cooldown periods, which would hinder responsiveness during sudden traffic increases.