Google Cloud Professional Machine Learning Engineer — Question 234

You have deployed a scikit-team model to a Vertex AI endpoint using a custom model server. You enabled autoscaling: however, the deployed model fails to scale beyond one replica, which led to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?

Answer options

Correct answer: B

Explanation

The correct answer is B because increasing the number of workers in the model server can help handle more requests concurrently, thus improving scalability. Options A and C do not address the immediate limitation of worker capacity, while option D only alters the minimum number of replicas without increasing their ability to handle requests effectively.