CompTIA DataX (DY0-001) — Question 71

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

Answer options

Correct answer: A

Explanation

Embeddings are the best choice for capturing semantic similarities between sentences as they convert text into dense vector representations that reflect the meanings of the words. Extrapolation, sampling, and one-hot encoding do not provide the necessary context and relationships between sentences, making them less suitable for identifying conceptual similarities.