Google Cloud Professional Machine Learning Engineer — Question 143

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

Answer options

Correct answer: C

Explanation

The correct answer is C because oversampling the minority class, in this case, fraudulent transactions, helps to balance the dataset and provides the classifier with more examples to learn from, which can improve its performance. The other options do not address the class imbalance directly; for instance, the Box-Cox transformation and log transformation are more about feature scaling rather than balancing class representation.