Google Cloud Professional Machine Learning Engineer — Question 143
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?
Answer options
- A. Modify the target variable using the Box-Cox transformation.
- B. Z-normalize all the numeric features.
- C. Oversample the fraudulent transaction 10 times.
- D. Log transform all numeric features.
Correct answer: C
Explanation
The correct answer is C because oversampling the minority class, in this case, fraudulent transactions, helps to balance the dataset and provides the classifier with more examples to learn from, which can improve its performance. The other options do not address the class imbalance directly; for instance, the Box-Cox transformation and log transformation are more about feature scaling rather than balancing class representation.