AWS Certified Solutions Architect – Associate (SAA-C03) — Question 568

A solutions architect manages an analytics application. The application stores large amounts of semistructured data in an Amazon S3 bucket. The solutions architect wants to use parallel data processing to process the data more quickly. The solutions architect also wants to use information that is stored in an Amazon Redshift database to enrich the data.

Which solution will meet these requirements?

Answer options

Correct answer: B

Explanation

Amazon EMR is a managed cluster platform designed for running big data frameworks like Apache Spark and Hadoop, making it ideal for highly parallel processing of large semistructured datasets in Amazon S3. EMR can natively connect to Amazon Redshift to read enrichment data and perform the join operations efficiently across the cluster. Other options like Amazon Athena, AWS Glue, or AWS Lake Formation do not offer the same level of optimized, highly customizable parallel processing and direct database integration for this specific scale of data enrichment.