Google Cloud Professional Machine Learning Engineer — Question 167
You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?
Answer options
- A. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 1
- B. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100
- C. Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 1
- D. Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 100
Correct answer: B
Explanation
The correct answer is B because setting the max replica count to 100 allows the model to efficiently handle the expected high volume of requests per second while still using the online Vertex AI prediction endpoint. Options A and C are inadequate for the required scale, as they would not support the high request load, and option D, while it increases capacity with GPUs, would incur higher costs which the question aims to minimize.