Google Cloud Professional Machine Learning Engineer — Question 134
While performing exploratory data analysis on a dataset, you find that an important categorical feature has 5% null values. You want to minimize the bias that could result from the missing values. How should you handle the missing values?
Answer options
- A. Remove the rows with missing values, and upsample your dataset by 5%.
- B. Replace the missing values with the feature’s mean.
- C. Replace the missing values with a placeholder category indicating a missing value.
- D. Move the rows with missing values to your validation dataset.
Correct answer: C
Explanation
The correct approach is to replace the missing values with a placeholder category indicating a missing value, as it allows the model to recognize and consider the absence of data without losing any information. Removing rows could lead to a biased dataset, and using the mean is inappropriate for categorical data. Moving rows to the validation dataset does not address the missing values in the training data.