A data scientist is attempting to identify sentences that are conceptually similar to eac…

Question

A data scientist is attempting to identify sentences that are conceptually similar to each other within a set of text files. Which of the following is the best way to prepare the data set to accomplish this task after data ingestion?

Accepted Answer

Correct answer: A. A. Embeddings — Embeddings are the best choice for capturing semantic similarities between sentences as they convert text into dense vector representations that reflect the meanings of the words. Extrapolation, sampling, and one-hot encoding do not provide the necessary context and relationships between sentences, making them less suitable for identifying conceptual similarities.

CompTIA DataX (DY0-001) — Question 71

Answer options

Correct answer: A

Explanation