Google Cloud Professional Machine Learning Engineer — Question 217

You have built a custom model that performs several memory-intensive preprocessing tasks before it makes a prediction. You deployed the model to a Vertex AI endpoint, and validated that results were received in a reasonable amount of time. After routing user traffic to the endpoint, you discover that the endpoint does not autoscale as expected when receiving multiple requests. What should you do?

Answer options

Correct answer: D

Explanation

The correct answer is D because decreasing the CPU utilization target allows the autoscaling feature to trigger more easily in response to incoming requests, enabling better handling of high traffic. Option A may provide more resources, but it doesn't address the scaling issue. Option B reduces the number of concurrent requests each machine can handle, which is counterproductive. Option C would make it harder for the system to scale up when needed.