You work for a small company that has deployed an ML model with autoscaling on Vertex AI…

Question

You work for a small company that has deployed an ML model with autoscaling on Vertex AI to serve online predictions in a production environment. The current model receives about 20 prediction requests per hour with an average response time of one second. You have retrained the same model on a new batch of data, and now you are canary testing it, sending ~10% of production traffic to the new model. During this canary test, you notice that prediction requests for your new model are taking between 30 and 180 seconds to complete. What should you do?

Accepted Answer

Correct answer: C. C. Remove your new model from the production environment. Compare the new model and existing model codes to identify the cause of the performance bottleneck. — The correct answer is C because removing the new model from production allows for a thorough investigation into the performance issues by comparing its code with the existing model. Option A does not address the root cause of the latency, while option B could lead to further performance issues without understanding the bottleneck. Option D introduces unnecessary complexity by redirecting requests to BigQuery instead of fixing the model itself.

Google Cloud Professional Machine Learning Engineer — Question 158

Answer options

Correct answer: C

Explanation