Databricks Certified Machine Learning Associate — Question 37
In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?
Answer options
- A. When the features are of the categorical type
- B. When the features are of the boolean type
- C. When the features contain a lot of extreme outliers
- D. When the features contain no outliers
- E. When the features contain no missing values
Correct answer: C
Explanation
The median is less sensitive to extreme outliers compared to the mean, making it a better choice when there are many outliers present in the data. Using the mean in such cases could skew the imputed values and misrepresent the data. The other options do not provide relevant conditions for preferring the median over the mean.