AWS Certified Machine Learning – Specialty — Question 38

An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?

Answer options

Correct answer: C

Explanation

Multiple imputation is the best choice because it allows for the estimation of missing values by creating several different plausible datasets and averaging the results, thus preserving the dataset's integrity. In contrast, listwise deletion removes entire rows with missing values, which can lead to loss of valuable data. Last observation carried forward and mean substitution can introduce bias and distort the underlying data distribution.