Google Cloud Professional Machine Learning Engineer — Question 333
You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?
Answer options
- A. Use sparse representation in the test set.
- B. Randomly redistribute the data, with 70% for the training set and 30% for the test set
- C. Apply one-hot encoding on the categorical variables in the test data
- D. Collect more data representing all categories
Correct answer: C
Explanation
The correct action is to apply one-hot encoding on the test data because this ensures that the test set has the same categorical variable structure as the training set. Options A and D do not directly address the mismatch in encoding, and option B alters the data split without solving the encoding issue.