Google Cloud Professional Machine Learning Engineer — Question 73
You are working on a classification problem with time series data. After conducting just a few experiments using random cross-validation, you achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
Answer options
- A. Address the model overfitting by using a less complex algorithm and use k-fold cross-validation.
- B. Address data leakage by applying nested cross-validation during model training.
- C. Address data leakage by removing features highly correlated with the target value.
- D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Correct answer: B
Explanation
The correct answer is B because nested cross-validation helps in identifying and mitigating data leakage, ensuring that the training process does not inadvertently use information from the test set. Option A suggests using a simpler algorithm, which may not address the issue of leakage. Option C focuses on correlation but does not provide a thorough method to handle data leakage, while D incorrectly suggests that reducing the AUC ROC score is a valid approach to overfitting.