AWS Certified Machine Learning – Specialty — Question 70
A Machine Learning Specialist is given a structured dataset on the shopping habits of a company's customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.
What approach should the Specialist take to accomplish these tasks?
Answer options
- A. Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
- B. Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
- C. Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
- D. Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.
Correct answer: A
Explanation
The correct answer is A because t-SNE is effective for visualizing high-dimensional data in a lower-dimensional space, and a scatter plot is suitable for displaying such visualizations. Option B, while useful for clustering, does not provide immediate visual insights as a scatter plot does. Option C incorrectly suggests a line graph, which is not appropriate for representing clusters, and option D focuses on box plots that don't provide a direct visualization of the overall structure like a scatter plot does.