A global company receives and processes hundreds of documents daily. The documents are in…

Question

A global company receives and processes hundreds of documents daily. The documents are in printed .pdf format or .jpg format. A machine learning (ML) specialist wants to build an automated document processing workflow to extract text from specific fields from the documents and to classify the documents. The ML specialist wants a solution that requires low maintenance. Which solution will meet these requirements with the LEAST operational effort?

Accepted Answer

Correct answer: D. D. Use Amazon Textract to detect and extract the required text and fields. Use Amazon Comprehend to classify the document. — Amazon Textract is a fully managed service that automatically extracts text and data from scanned documents, eliminating the operational overhead of managing custom OCR models like PaddleOCR on Amazon SageMaker. Amazon Comprehend is a managed natural language processing service ideal for classifying text-based documents, whereas Amazon Rekognition is optimized for computer vision tasks on images/videos rather than text classification. Combining Textract and Comprehend provides a serverless, low-maintenance solution that minimizes operational effort.

AWS Certified Machine Learning – Specialty — Question 321

Answer options

Correct answer: D

Explanation