AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 100
A company needs to extract entities from a PDF document to build a classifier model.
Which solution will extract and store the entities in the LEAST amount of time?
Answer options
- A. Use Amazon Comprehend to extract the entities. Store the output in Amazon S3.
- B. Use an open source AI optical character recognition (OCR) tool on Amazon SageMaker to extract the entities. Store the output in Amazon S3.
- C. Use Amazon Textract to extract the entities. Use Amazon Comprehend to convert the entities to text. Store the output in Amazon S3.
- D. Use Amazon Textract integrated with Amazon Augmented AI (Amazon A2I) to extract the entities. Store the output in Amazon S3.
Correct answer: A
Explanation
The correct answer is A because Amazon Comprehend is specifically designed for natural language processing tasks and can quickly extract entities directly from text, making it the fastest option. Options B, C, and D involve additional steps or tools that would slow down the process, such as performing OCR or integrating with other services.