A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data e…

Question

A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data engineer will use the customer data to create a new model to predict customer behavior. The engineer needs to increase the model performance by checking for multicollinearity in the dataset. Which steps can the data engineer take to accomplish this with the LEAST operational effort? (Choose two.)

Accepted Answer

Correct answer: B, E. B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values. — E. Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset. — SageMaker Data Wrangler's diagnostic visualization supports PCA/SVD for singular value calculation and LASSO for plotting coefficient values, both of which are direct methods to detect multicollinearity with minimal effort. One-hot encoding and Min Max scaling are feature engineering transformations, not diagnostic tools for multicollinearity. The Quick Model visualization is designed to estimate feature importance and model viability, rather than specifically diagnosing multicollinearity.

AWS Certified Machine Learning – Specialty — Question 275

Answer options

Correct answer: B, E

Explanation