Databricks Certified Machine Learning Associate — Question 13

A data scientist has replaced missing values in their feature set with each respective feature variable’s median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.
Which of the following approaches can they take to include as much information as possible in the feature set?

Answer options

Correct answer: D

Explanation

The correct answer is D because creating a binary feature to indicate imputation preserves information about which values were originally missing, which may be valuable for the model. Options A and C either change the data distribution or remove potentially useful features entirely. Option B neglects the missing data without any alternative strategy, and option E provides some context but does not directly inform the model about the imputed values.