CompTIA DataX (DY0-001) — Question 39
A data scientist wants to digitize historical hard copies of documents. Which of the following is the best method for this task?
Answer options
- A. Word2vec
- B. Optical character recognition
- C. Latent semantic analysis
- D. Semantic segmentation
Correct answer: B
Explanation
Optical character recognition (OCR) is specifically designed to convert different types of documents, such as scanned paper documents or PDFs, into editable and searchable data. The other options, while related to text processing and analysis, do not perform the task of digitizing hard copy documents.