Google Cloud Professional Machine Learning Engineer — Question 251

You recently deployed a model to a Vertex AI endpoint and set up online serving in Vertex AI Feature Store. You have configured a daily batch ingestion job to update your featurestore. During the batch ingestion jobs, you discover that CPU utilization is high in your featurestore’s online serving nodes and that feature retrieval latency is high. You need to improve online serving performance during the daily batch ingestion. What should you do?

Answer options

Correct answer: A

Explanation

Increasing the number of online serving nodes before the batch ingestion jobs (option A) allows for better handling of increased load, thereby improving performance. Autoscaling the online serving nodes (option B) could help, but it may not respond quickly enough to the immediate needs during batch ingestion. Autoscaling the prediction nodes of the DeployedModel (option C) does not directly address the online serving nodes' performance. Increasing the worker_count in the batch ingestion job (option D) helps with ingestion performance but does not alleviate the latency issues in online serving.