AWS Certified Machine Learning – Specialty — Question 172

A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the data engineer notices that many of the 100 features are highly correlated with each other.
Which steps should the data engineer take to address this issue? (Choose two.)

Answer options

Correct answer: B, C

Explanation

The correct answers are B and C. Applying principal component analysis (PCA) helps in reducing dimensionality by transforming correlated features into a set of uncorrelated variables, while removing a portion of highly correlated features directly addresses the redundancy in the dataset. Using a linear-based algorithm (A), applying min-max scaling (D), or one-hot encoding (E) do not specifically mitigate the issue of high correlation among features.