You have developed a custom ML model using Vertex AI and want to deploy it for online ser…

Question

You have developed a custom ML model using Vertex AI and want to deploy it for online serving. You need to optimize the model's serving performance by ensuring that the model can handle high throughput while minimizing latency. You want to use the simplest solution. What should you do?

Accepted Answer

Correct answer: A. A. Deploy the model to a Vertex AI endpoint resource to automatically scale the serving backend based on the throughput. Configure the endpoint's autoscaling settings to minimize latency. — The correct answer is A because deploying the model to a Vertex AI endpoint allows for automatic scaling based on demand, which optimizes both throughput and latency. Option B, while viable, introduces additional complexity with containerization that may not be necessary for the simplest solution. Options C and D focus on model improvement and analysis rather than directly addressing the immediate serving performance needs.

Google Cloud Professional Machine Learning Engineer — Question 326

Answer options

Correct answer: A

Explanation