AWS Certified Machine Learning – Specialty — Question 284
An online retailer collects the following data on customer orders: demographics, behaviors, location, shipment progress, and delivery time. A data scientist joins all the collected datasets. The result is a single dataset that includes 980 variables.
The data scientist must develop a machine learning (ML) model to identify groups of customers who are likely to respond to a marketing campaign.
Which combination of algorithms should the data scientist use to meet this requirement? (Choose two.)
Answer options
- A. Latent Dirichlet Allocation (LDA)
- B. K-means
- C. Semantic segmentation
- D. Principal component analysis (PCA)
- E. Factorization machines (FM)
Correct answer: B, D
Explanation
With 980 variables, the dataset suffers from high dimensionality, making Principal component analysis (PCA) the ideal choice to reduce feature dimensions while retaining maximum variance. Once the dimensions are reduced, K-means can be effectively applied to cluster and segment the customers into distinct groups for targeted marketing. Other options like Latent Dirichlet Allocation (LDA) are for text topic modeling, semantic segmentation is for image processing, and Factorization machines (FM) are primarily for recommendation systems.