You recently deployed a scikit-learn model to a Vertex AI endpoint. You are now testing t…

Question

You recently deployed a scikit-learn model to a Vertex AI endpoint. You are now testing the model on live production traffic. While monitoring the endpoint, you discover twice as many requests per hour than expected throughout the day. You want the endpoint to efficiently scale when the demand increases in the future to prevent users from experiencing high latency. What should you do?

Accepted Answer

Correct answer: B. B. Configure an appropriate minReplicaCount value based on expected baseline traffic — The correct answer is B because configuring the minReplicaCount allows the endpoint to maintain a sufficient number of replicas to handle expected baseline traffic, thus reducing latency. Option A may not effectively address the scaling needs, while C could lead to inefficient resource usage if set too high. Option D may improve performance but does not directly address the scaling issue related to request load.

Google Cloud Professional Machine Learning Engineer — Question 190

Answer options

Correct answer: B

Explanation