AWS Certified Solutions Architect – Associate (SAA-C03) — Question 568
A solutions architect manages an analytics application. The application stores large amounts of semistructured data in an Amazon S3 bucket. The solutions architect wants to use parallel data processing to process the data more quickly. The solutions architect also wants to use information that is stored in an Amazon Redshift database to enrich the data.
Which solution will meet these requirements?
Answer options
- A. Use Amazon Athena to process the S3 data. Use AWS Glue with the Amazon Redshift data to enrich the S3 data.
- B. Use Amazon EMR to process the S3 data. Use Amazon EMR with the Amazon Redshift data to enrich the S3 data.
- C. Use Amazon EMR to process the S3 data. Use Amazon Kinesis Data Streams to move the S3 data into Amazon Redshift so that the data can be enriched.
- D. Use AWS Glue to process the S3 data. Use AWS Lake Formation with the Amazon Redshift data to enrich the S3 data.
Correct answer: B
Explanation
Amazon EMR is a managed cluster platform designed for running big data frameworks like Apache Spark and Hadoop, making it ideal for highly parallel processing of large semistructured datasets in Amazon S3. EMR can natively connect to Amazon Redshift to read enrichment data and perform the join operations efficiently across the cluster. Other options like Amazon Athena, AWS Glue, or AWS Lake Formation do not offer the same level of optimized, highly customizable parallel processing and direct database integration for this specific scale of data enrichment.