Google Cloud Professional Machine Learning Engineer — Question 69
You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What should you do?
Answer options
- A. Use batch prediction mode instead of online mode.
- B. Send the request again with a smaller batch of instances.
- C. Use base64 to encode your data before using it for prediction.
- D. Apply for a quota increase for the number of prediction requests.
Correct answer: B
Explanation
The correct answer is B because reducing the batch size decreases the amount of memory required for processing, thus preventing the 'Out of Memory' error. Option A is incorrect as switching to batch mode may not resolve the immediate issue of memory overload. Option C does not address memory issues and instead focuses on data encoding. Option D is also not relevant since the error is related to memory usage, not the number of requests.