Google Cloud Professional Machine Learning Engineer — Question 167

You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?

Answer options

Correct answer: B

Explanation

The correct answer is B because setting the max replica count to 100 allows the model to efficiently handle the expected high volume of requests per second while still using the online Vertex AI prediction endpoint. Options A and C are inadequate for the required scale, as they would not support the high request load, and option D, while it increases capacity with GPUs, would incur higher costs which the question aims to minimize.