AWS Certified Machine Learning – Specialty — Question 345
An ecommerce company sends a weekly email newsletter to all of its customers. Management has hired a team of writers to create additional targeted content. A data scientist needs to identify five customer segments based on age, income, and location. The customers' current segmentation is unknown. The data scientist previously built an XGBoost model to predict the likelihood of a customer responding to an email based on age, income, and location.
Why does the XGBoost model NOT meet the current requirements, and how can this be fixed?
Answer options
- A. The XGBoost model provides a true/false binary output. Apply principal component analysis (PCA) with five feature dimensions to predict a segment.
- B. The XGBoost model provides a true/false binary output. Increase the number of classes the XGBoost model predicts to five classes to predict a segment.
- C. The XGBoost model is a supervised machine learning algorithm. Train a k-Nearest-Neighbors (kNN) model with K = 5 on the same dataset to predict a segment.
- D. The XGBoost model is a supervised machine learning algorithm. Train a k-means model with K = 5 on the same dataset to predict a segment.
Correct answer: D
Explanation
Finding customer segments when the groupings are unknown is an unsupervised clustering task. XGBoost and k-Nearest-Neighbors (kNN) are supervised learning algorithms that require labeled target data, which does not exist in this scenario. Training a k-means clustering model with K = 5 is the correct unsupervised approach to group the unlabeled data into five distinct segments.