Google Cloud Professional Machine Learning Engineer — Question 44

You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of
99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

Answer options

Correct answer: B

Explanation

The correct answer is B, as nested cross-validation helps to prevent data leakage by ensuring that the model is validated on unseen data during the training process. Option A is incorrect because using a less complex algorithm may not address the underlying issue of data leakage. Option C is also incorrect since simply removing correlated features does not directly address the problem of data leakage, which could be more complex. Option D is incorrect as tuning hyperparameters to reduce AUC ROC value does not solve overfitting and may lead to worse model performance.