Google Cloud Professional Machine Learning Engineer — Question 299
You work as an ML researcher at an investment bank, and you are experimenting with the Gemma large language model (LLM). You plan to deploy the model for an internal use case. You need to have full control of the mode's underlying infrastructure and minimize the model's inference time. Which serving configuration should you use for this task?
Answer options
- A. Deploy the model on a Vertex AI endpoint manually by creating a custom inference container.
- B. Deploy the model on a Google Kubernetes Engine (GKE) cluster by using the deployment options in Model Garden.
- C. Deploy the model on a Vertex AI endpoint by using one-click deployment in Model Garden.
- D. Deploy the model on a Google Kubernetes Engine (GKE) cluster manually by cresting a custom yaml manifest.
Correct answer: D
Explanation
Option D is correct because it allows for manual control over the deployment and customization of the infrastructure, which is essential for minimizing inference time and managing resources effectively. Options A and C do not provide the same level of control, while option B, although utilizing GKE, does not specify the manual creation of a custom setup, which is necessary for full infrastructure management.