AWS Certified Machine Learning – Specialty — Question 338

A data scientist uses Amazon SageMaker Data Wrangler to analyze and visualize data. The data scientist wants to refine a training dataset by selecting predictor variables that are strongly predictive of the target variable. The target variable correlates with other predictor variables.

The data scientist wants to understand the variance in the data along various directions in the feature space.

Which solution will meet these requirements?

Answer options

Correct answer: C

Explanation

Principal Component Analysis (PCA) is an unsupervised learning technique used to analyze the variance in data along orthogonal directions in the feature space, which directly addresses the requirement. SageMaker Data Wrangler supports PCA within its multicollinearity measurement features to help identify how features contribute to the overall variance. Other options, such as VIF or the Data Quality and Insights Report, are useful for assessing feature correlation and predictive power but do not map the variance across different directions in the feature space.