Google Cloud Professional Machine Learning Engineer — Question 157
You are an ML engineer at a retail company. You have built a model that predicts a coupon to offer an ecommerce customer at checkout based on the items in their cart. When a customer goes to checkout, your serving pipeline, which is hosted on Google Cloud, joins the customer's existing cart with a row in a BigQuery table that contains the customers' historic purchase behavior and uses that as the model's input. The web team is reporting that your model is returning predictions too slowly to load the coupon offer with the rest of the web page. How should you speed up your model's predictions?
Answer options
- A. Attach an NVIDIA P100 GPU to your deployed model’s instance.
- B. Use a low latency database for the customers’ historic purchase behavior.
- C. Deploy your model to more instances behind a load balancer to distribute traffic.
- D. Create a materialized view in BigQuery with the necessary data for predictions.
Correct answer: B
Explanation
The correct answer is B because using a low latency database can significantly reduce the time it takes to retrieve the customer's historic purchase behavior, which is essential for timely predictions. Option A may improve processing power but doesn't address data retrieval speed. Option C could help handle more requests simultaneously but won't necessarily speed up the individual prediction process. Option D, while potentially useful, would still rely on the speed of accessing data from BigQuery, which may not be optimized for real-time predictions.