CompTIA DataX (DY0-001) — Question 80
Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?
Answer options
- A. Word cloud
- B. Edit distance
- C. String indexing
- D. k-nearest neighbors
Correct answer: B
Explanation
The correct answer is B, Edit distance, which quantifies the difference between two strings by counting the minimum number of operations required to transform one string into the other. Options A (Word cloud) visualizes text data rather than measuring similarity, C (String indexing) relates to organizing text for retrieval, and D (k-nearest neighbors) is a classification method that does not specifically measure string similarity.