AWS Certified Machine Learning – Specialty — Question 83
A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company's dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices.
Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model's complexity?
Answer options
- A. Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
- B. Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
- C. Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
- D. Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
Correct answer: D
Explanation
The correct answer is D because running a correlation check against the target variable helps identify which features have a significant relationship with the sale price, allowing for the removal of those that do not contribute meaningfully to the prediction. Options A and B focus on variance without considering the target variable, which may not effectively reduce complexity. Option C looks at mutual correlation among features but does not assess their relevance to the target variable.