AWS Certified Machine Learning – Specialty — Question 275
A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data engineer will use the customer data to create a new model to predict customer behavior.
The engineer needs to increase the model performance by checking for multicollinearity in the dataset.
Which steps can the data engineer take to accomplish this with the LEAST operational effort? (Choose two.)
Answer options
- A. Use SageMaker Data Wrangler to refit and transform the dataset by applying one-hot encoding to category-based variables.
- B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values.
- C. Use the SageMaker Data Wrangler Quick Model visualization to quickly evaluate the dataset and to produce importance scores for each feature.
- D. Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data.
- E. Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset.
Correct answer: B, E
Explanation
SageMaker Data Wrangler's diagnostic visualization supports PCA/SVD for singular value calculation and LASSO for plotting coefficient values, both of which are direct methods to detect multicollinearity with minimal effort. One-hot encoding and Min Max scaling are feature engineering transformations, not diagnostic tools for multicollinearity. The Quick Model visualization is designed to estimate feature importance and model viability, rather than specifically diagnosing multicollinearity.