Designing and Implementing a Data Science Solution on Azure — Question 13
You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not require the application of predictors for each column. You plan to use the Clean Missing Data.
You need to select a data cleaning method.
Which method should you use?
Answer options
- A. Replace using Probabilistic PCA
- B. Normalization
- C. Synthetic Minority Oversampling Technique (SMOTE)
- D. Replace using MICE
Correct answer: A
Explanation
The correct answer is A, as Probabilistic PCA is effective in estimating missing values in datasets where the structure is complex, which matches your dataset's needs. The other options are not suitable for simply handling missing values; Normalization is for scaling data, SMOTE is for balancing classes in imbalanced datasets, and MICE is more complex than necessary for this scenario.