SAS Statistical Business Analysis Using SAS 9: Regression and Modeling — Question 27
When mean imputation is performed on data after the data is partitioned for honest assessment, what is the most appropriate method for handling the mean imputation?
Answer options
- A. The sample means from the validation data set are applied to the training and test data sets.
- B. The sample means from the training data set are applied to the validation and test data sets.
- C. The sample means from the test data set are applied to the training and validation data sets.
- D. The sample means from each partition of the data are applied to their own partition.
Correct answer: B
Explanation
The correct answer is B because mean imputation should be based on the training data to avoid data leakage, ensuring that the validation and test sets remain unbiased. Options A and C incorrectly use means from validation and test sets, which can lead to inflated performance metrics. Option D is not suitable since it suggests using means within the same partition, which does not address the imputation across different sets.