You have deployed a scikit-team model to a Vertex AI endpoint using a custom model server…

Question

You have deployed a scikit-team model to a Vertex AI endpoint using a custom model server. You enabled autoscaling: however, the deployed model fails to scale beyond one replica, which led to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?

Accepted Answer

Correct answer: B. B. Increase the number of workers in your model server — The correct answer is B because increasing the number of workers in the model server can help handle more requests concurrently, thus improving scalability. Options A and C do not address the immediate limitation of worker capacity, while option D only alters the minimum number of replicas without increasing their ability to handle requests effectively.

Google Cloud Professional Machine Learning Engineer — Question 234

Answer options

Correct answer: B

Explanation