CompTIA DataX (DY0-001) — Question 80

Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?

Answer options

Correct answer: B

Explanation

The correct answer is B, Edit distance, which quantifies the difference between two strings by counting the minimum number of operations required to transform one string into the other. Options A (Word cloud) visualizes text data rather than measuring similarity, C (String indexing) relates to organizing text for retrieval, and D (k-nearest neighbors) is a classification method that does not specifically measure string similarity.