AWS Certified Machine Learning – Specialty — Question 213
A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products.
Which solution will meet these requirements with the MOST operational efficiency?
Answer options
- A. Build a custom clustering model. Create a Dockerfile and build a Docker image. Register the Docker image in Amazon Elastic Container Registry (Amazon ECR). Use the custom image in Amazon SageMaker to generate a trained model.
- B. Tokenize the data and transform the data into tabular data. Train an Amazon SageMaker k-means model to generate the product categories.
- C. Train an Amazon SageMaker Neural Topic Model (NTM) model to generate the product categories.
- D. Train an Amazon SageMaker Blazing Text model to generate the product categories.
Correct answer: C
Explanation
The Neural Topic Model (NTM) in Amazon SageMaker is specifically designed for topic modeling and can effectively categorize documents based on their content. Options A, B, and D either require more complex setups or are not as specialized for the task of categorization as the NTM, making them less efficient in this scenario.