AWS Certified Machine Learning – Specialty — Question 23

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor, and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset.
Which tool should be used to improve the validation accuracy?

Answer options

Correct answer: D

Explanation

The correct answer is D, as the TF-IDF vectorizer effectively transforms the text data to highlight important words while reducing the impact of common words, which can help improve validation accuracy. Options A and B do not specifically address the vocabulary and frequency issues, while option C may simplify the dataset but does not enhance the representation of word importance effectively.