AWS Certified Machine Learning – Specialty — Question 79

A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues.
The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset.
Which feature engineering technique should the Data Scientist use to meet the objectives?

Answer options

Correct answer: C

Explanation

The correct answer is C because using an autoencoder or PCA helps to reduce dimensionality while capturing the most important information from the dataset, thus speeding up training. Option A focuses only on removing correlated features, which may not address other issues like overfitting. Option B does not reduce the number of features, and while normalization is beneficial, it does not address the training speed issue. Option D may lead to loss of information and doesn't directly tackle the problem of feature correlation.