A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyz…

Question

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage.
Which AWS service strategy is best for this use case?

Accepted Answer

Correct answer: C. C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index. — The correct answer is C because Amazon Elasticsearch Service is optimized for searching and analyzing large volumes of text data, making it ideal for this use case. Options A and D do not utilize the scalable capabilities needed for the large dataset effectively, while option B, although applicable for processing, may not provide the specialized text analysis features that Elasticsearch offers.

AWS Certified Big Data – Specialty — Question 17

Answer options

Correct answer: C

Explanation