Google Cloud Professional Machine Learning Engineer — Question 315
You are building an ML model to predict customer churn for a subscription service. You have trained your model on Vertex AI using historical data, and deployed it to a Vertex AI endpoint for real-time predictions. After a few weeks, you notice that the model's performance, measured by AUC (area under the ROC curve), has dropped significantly in production compared to its performance during training. How should you troubleshoot this problem?
Answer options
- A. Monitor the training/serving skew of feature values for requests sent to the endpoint.
- B. Monitor the resource utilization of the endpoint, such as CPU and memory usage, to identify potential bottlenecks in performance.
- C. Enable Vertex Explainable AI feature attribution to analyze model predictions and understand the impact of each feature on the model's predictions.
- D. Monitor the latency of the endpoint to determine whether predictions are being served within the expected time frame.
Correct answer: A
Explanation
The correct answer is A, as monitoring the training/serving skew of feature values can help identify if the input data has changed significantly, leading to performance degradation. Options B, C, and D focus on resource utilization, explainability, and latency, which do not directly address the issue of model performance due to changes in input data distribution.