Google Cloud Professional Machine Learning Engineer — Question 44
You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of
99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
Answer options
- A. Address the model overfitting by using a less complex algorithm.
- B. Address data leakage by applying nested cross-validation during model training.
- C. Address data leakage by removing features highly correlated with the target value.
- D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Correct answer: B
Explanation
The correct answer is B, as nested cross-validation helps to prevent data leakage by ensuring that the model is validated on unseen data during the training process. Option A is incorrect because using a less complex algorithm may not address the underlying issue of data leakage. Option C is also incorrect since simply removing correlated features does not directly address the problem of data leakage, which could be more complex. Option D is incorrect as tuning hyperparameters to reduce AUC ROC value does not solve overfitting and may lead to worse model performance.