AWS Certified Machine Learning – Specialty — Question 311
A machine learning (ML) developer for an online retailer recently uploaded a sales dataset into Amazon SageMaker Studio. The ML developer wants to obtain importance scores for each feature of the dataset. The ML developer will use the importance scores to feature engineer the dataset.
Which solution will meet this requirement with the LEAST development effort?
Answer options
- A. Use SageMaker Data Wrangler to perform a Gini importance score analysis.
- B. Use a SageMaker notebook instance to perform principal component analysis (PCA).
- C. Use a SageMaker notebook instance to perform a singular value decomposition analysis.
- D. Use the multicollinearity feature to perform a lasso feature selection to perform an importance scores analysis.
Correct answer: A
Explanation
Amazon SageMaker Data Wrangler provides a built-in, low-code interface to analyze data quality and calculate feature importance (such as Gini importance scores) quickly. Using a SageMaker notebook instance to perform PCA, SVD, or custom Lasso regression requires writing and maintaining custom code, which demands significantly more development effort. Therefore, Data Wrangler is the most efficient solution for obtaining feature importance scores with the least effort.