AWS Certified Machine Learning – Specialty — Question 277
A data scientist is using Amazon Comprehend to perform sentiment analysis on a dataset of one million social media posts.
Which approach will process the dataset in the LEAST time?
Answer options
- A. Use a combination of AWS Step Functions and an AWS Lambda function to call the DetectSentiment API operation for each post synchronously.
- B. Use a combination of AWS Step Functions and an AWS Lambda function to call the BatchDetectSentiment API operation with batches of up to 25 posts at a time.
- C. Upload the posts to Amazon S3. Pass the S3 storage path to an AWS Lambda function that calls the StartSentimentDetectionJob API operation.
- D. Use an AWS Lambda function to call the BatchDetectSentiment API operation with the whole dataset.
Correct answer: C
Explanation
Using the asynchronous StartSentimentDetectionJob API is the most efficient way to analyze large datasets because Amazon Comprehend processes the S3-stored data in parallel in the background. Synchronous single or batch API calls (DetectSentiment and BatchDetectSentiment) would require millions or thousands of individual HTTP requests, leading to high latency and potential API throttling. Additionally, BatchDetectSentiment has a strict limit of 25 documents per request, making option D technically impossible.