AWS Certified Machine Learning – Specialty — Question 174

A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile.
Which feature engineering strategy should the ML specialist use with Amazon SageMaker?

Answer options

Correct answer: A

Explanation

The correct answer is A because applying dimensionality reduction with PCA is effective when dealing with high correlation among features, which helps in reducing the feature space while retaining most of the variance. The other options do not address the high correlation effectively; dropping features (B) might lead to loss of important information, and anomaly detection (C) and concatenation of features (D) are not suitable strategies for this scenario.