Google Cloud Professional Machine Learning Engineer — Question 333

You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?

Answer options

Correct answer: C

Explanation

The correct action is to apply one-hot encoding on the test data because this ensures that the test set has the same categorical variable structure as the training set. Options A and D do not directly address the mismatch in encoding, and option B alters the data split without solving the encoding issue.