Google Cloud Professional Machine Learning Engineer — Question 129
You are developing an ML model to predict house prices. While preparing the data, you discover that an important predictor variable, distance from the closest school, is often missing and does not have high variance. Every instance (row) in your data is important. How should you handle the missing data?
Answer options
- A. Delete the rows that have missing values.
- B. Apply feature crossing with another column that does not have missing values.
- C. Predict the missing values using linear regression.
- D. Replace the missing values with zeros.
Correct answer: C
Explanation
The correct answer is C because using linear regression allows you to predict and fill in the missing values based on other available data, maintaining the integrity of your dataset. Options A and D would result in loss of valuable data, while B could introduce complexity without directly addressing the missing values.