EMC Proven Professional – Data Science and Big Data Analytics — Question 40
A data scientist plans to classify the sentiment polarity of 10, 000 product reviews collected from the Internet. What is the most appropriate model to use? Suppose labeled training data is available.
Answer options
- A. Naïve Bayesian classifier
- B. Linear regression
- C. Logistic regression
- D. K-means clustering
Correct answer: A
Explanation
The Naïve Bayesian classifier is the most suitable model for sentiment analysis as it effectively handles classification tasks with labeled training data. Linear regression is not appropriate for classification problems, while logistic regression could be considered, but Naïve Bayes is often preferred for text classification. K-means clustering is an unsupervised learning method that does not apply to this scenario where labeled data is present.