AWS Certified Machine Learning – Specialty — Question 156
A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy.
Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team's needs? (Choose two.)
Answer options
- A. Add L1 regularization to the classifier
- B. Add features to the dataset
- C. Perform recursive feature elimination
- D. Perform t-distributed stochastic neighbor embedding (t-SNE)
- E. Perform linear discriminant analysis
Correct answer: A, C
Explanation
Adding L1 regularization helps reduce overfitting by penalizing the absolute size of the coefficients, which can lead to a more generalizable model. Recursive feature elimination aids in identifying and selecting the most relevant features, improving interpretability and performance. The other options either do not address the overfitting issue effectively or do not provide the necessary insights into feature relevance.