AWS Certified Machine Learning – Specialty — Question 342
A finance company has collected stock return data for 5,000 publicly traded companies. A financial analyst has a dataset that contains 2,000 attributes for each company. The financial analyst wants to use Amazon SageMaker to identify the top 15 attributes that are most valuable to predict future stock returns.
Which solution will meet these requirements with the LEAST operational overhead?
Answer options
- A. Use the linear leaner algorithm in SageMaker to train a linear regression model to predict the stock returns. Identify the most predictive features by ranking absolute coefficient values.
- B. Use random forest regression in SageMaker to train a model to predict the stock returns. Identify the most predictive features based on Gini importance scores.
- C. Use an Amazon SageMaker Data Wrangler quick model visualization to predict the stock returns. Identify the most predictive features based on the quick mode's feature importance scores.
- D. Use Amazon SageMaker Autopilot to build a regression model to predict the stock returns. Identify the most predictive features based on an Amazon SageMaker Clarify report.
Correct answer: C
Explanation
Amazon SageMaker Data Wrangler includes a quick model visualization feature that allows analysts to rapidly evaluate data quality and determine feature importance with zero code and minimal operational overhead. Other options like training a linear learner, running random forest regression, or launching an Autopilot job require significantly more setup, training time, and operational effort. Data Wrangler's built-in feature importance score directly identifies the top attributes required with the least amount of friction.