Google Cloud Professional Machine Learning Engineer — Question 57
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?
Answer options
- A. Write your data in TFRecords.
- B. Z-normalize all the numeric features.
- C. Oversample the fraudulent transaction 10 times.
- D. Use one-hot encoding on all categorical features.
Correct answer: C
Explanation
The correct answer is C because oversampling the fraudulent transactions helps balance the dataset, which is crucial for improving model performance in cases of class imbalance. Options A and B do not directly address the imbalance issue, while D focuses on encoding categorical features without solving the problem of significantly fewer fraudulent cases.