AWS Certified Machine Learning – Specialty — Question 233
An online store is predicting future book sales by using a linear regression model that is based on past sales data. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.
Which data transformation step should the data scientist take to improve the predictions of the model?
Answer options
- A. One-hot encoding
- B. Cartesian product transformation
- C. Quantile binning
- D. Normalization
Correct answer: C
Explanation
The correct answer is C, Quantile binning, as it can help to address the skewed and non-linear relationship by grouping data into bins based on quantiles, which can stabilize variance and improve the linearity of the relationship. The other options, such as One-hot encoding and Normalization, are not suitable for addressing non-linear relationships in this context, while Cartesian product transformation does not apply to this situation.