A solutions architect manages an analytics application. The application stores large amou…

Question

A solutions architect manages an analytics application. The application stores large amounts of semistructured data in an Amazon S3 bucket. The solutions architect wants to use parallel data processing to process the data more quickly. The solutions architect also wants to use information that is stored in an Amazon Redshift database to enrich the data. Which solution will meet these requirements?

Accepted Answer

Correct answer: B. B. Use Amazon EMR to process the S3 data. Use Amazon EMR with the Amazon Redshift data to enrich the S3 data. — Amazon EMR is a managed cluster platform designed for running big data frameworks like Apache Spark and Hadoop, making it ideal for highly parallel processing of large semistructured datasets in Amazon S3. EMR can natively connect to Amazon Redshift to read enrichment data and perform the join operations efficiently across the cluster. Other options like Amazon Athena, AWS Glue, or AWS Lake Formation do not offer the same level of optimized, highly customizable parallel processing and direct database integration for this specific scale of data enrichment.

AWS Certified Solutions Architect – Associate (SAA-C03) — Question 568

Answer options

Correct answer: B

Explanation