AWS Certified Machine Learning – Specialty — Question 68
A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also shows a right skew, with fewer older individuals participating in the workforce.
Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)
Answer options
- A. Cross-validation
- B. Numerical value binning
- C. High-degree polynomial transformation
- D. Logarithmic transformation
- E. One hot encoding
Correct answer: B, D
Explanation
B (Numerical value binning) can help in reducing skewness by grouping age into bins, providing a more uniform distribution. D (Logarithmic transformation) is effective for right-skewed data like income, as it compresses the range of high values. A (Cross-validation) and E (One hot encoding) are not applicable for skew correction, while C (High-degree polynomial transformation) may actually exacerbate skewness.