AWS Certified Machine Learning – Specialty — Question 208

A company wants to segment a large group of customers into subgroups based on shared characteristics. The company’s data scientist is planning to use the Amazon SageMaker built-in k-means clustering algorithm for this task. The data scientist needs to determine the optimal number of subgroups (k) to use.

Which data visualization approach will MOST accurately determine the optimal value of k?

Answer options

Correct answer: D

Explanation

The correct answer, D, accurately identifies the optimal k by plotting the sum of squared errors (SSE) against different k values and finding the point where the curve starts to decline linearly, indicating diminishing returns on clustering quality. Option A focuses incorrectly on PCA components for separation, B misapplies PCA's explained variance, and C utilizes t-SNE, which is not ideal for determining k in k-means clustering.