AWS Certified Machine Learning – Specialty — Question 23
A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor, and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset.
Which tool should be used to improve the validation accuracy?
Answer options
- A. Amazon Comprehend syntax analysis and entity detection
- B. Amazon SageMaker BlazingText cbow mode
- C. Natural Language Toolkit (NLTK) stemming and stop word removal
- D. Scikit-leam term frequency-inverse document frequency (TF-IDF) vectorizer
Correct answer: D
Explanation
The correct answer is D, as the TF-IDF vectorizer effectively transforms the text data to highlight important words while reducing the impact of common words, which can help improve validation accuracy. Options A and B do not specifically address the vocabulary and frequency issues, while option C may simplify the dataset but does not enhance the representation of word importance effectively.