AWS Certified Machine Learning – Specialty — Question 271
A company is building a new supervised classification model in an AWS environment. The company's data science team notices that the dataset has a large quantity of variables. All the variables are numeric.
The model accuracy for training and validation is low. The model's processing time is affected by high latency. The data science team needs to increase the accuracy of the model and decrease the processing time.
What should the data science team do to meet these requirements?
Answer options
- A. Create new features and interaction variables.
- B. Use a principal component analysis (PCA) model.
- C. Apply normalization on the feature set.
- D. Use a multiple correspondence analysis (MCA) model.
Correct answer: B
Explanation
Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique ideal for datasets with a large number of numeric variables, which helps reduce model latency and combat the curse of dimensionality to improve accuracy. Creating new features (Option A) would increase dimensionality and worsen latency, while normalization (Option C) does not reduce the number of variables. Multiple Correspondence Analysis (Option D) is designed for categorical data, making it unsuitable for this numeric dataset.