Google Cloud Professional Machine Learning Engineer — Question 251
You recently deployed a model to a Vertex AI endpoint and set up online serving in Vertex AI Feature Store. You have configured a daily batch ingestion job to update your featurestore. During the batch ingestion jobs, you discover that CPU utilization is high in your featurestore’s online serving nodes and that feature retrieval latency is high. You need to improve online serving performance during the daily batch ingestion. What should you do?
Answer options
- A. Schedule an increase in the number of online serving nodes in your featurestore prior to the batch ingestion jobs
- B. Enable autoscaling of the online serving nodes in your featurestore
- C. Enable autoscaling for the prediction nodes of your DeployedModel in the Vertex AI endpoint
- D. Increase the worker_count in the ImportFeatureValues request of your batch ingestion job
Correct answer: A
Explanation
Increasing the number of online serving nodes before the batch ingestion jobs (option A) allows for better handling of increased load, thereby improving performance. Autoscaling the online serving nodes (option B) could help, but it may not respond quickly enough to the immediate needs during batch ingestion. Autoscaling the prediction nodes of the DeployedModel (option C) does not directly address the online serving nodes' performance. Increasing the worker_count in the batch ingestion job (option D) helps with ingestion performance but does not alleviate the latency issues in online serving.