Google Cloud Professional Machine Learning Engineer — Question 57

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

Answer options

Correct answer: C

Explanation

The correct answer is C because oversampling the fraudulent transactions helps balance the dataset, which is crucial for improving model performance in cases of class imbalance. Options A and B do not directly address the imbalance issue, while D focuses on encoding categorical features without solving the problem of significantly fewer fraudulent cases.