AWS Certified Solutions Architect – Associate (SAA-C03) — Question 677
A company’s applications use Apache Hadoop and Apache Spark to process data on premises. The existing infrastructure is not scalable and is complex to manage.
A solutions architect must design a scalable solution that reduces operational complexity. The solution must keep the data processing on premises.
Which solution will meet these requirements?
Answer options
- A. Use AWS Site-to-Site VPN to access the on-premises Hadoop Distributed File System (HDFS) data and application. Use an Amazon EMR cluster to process the data.
- B. Use AWS DataSync to connect to the on-premises Hadoop Distributed File System (HDFS) cluster. Create an Amazon EMR cluster to process the data.
- C. Migrate the Apache Hadoop application and the Apache Spark application to Amazon EMR clusters on AWS Outposts. Use the EMR clusters to process the data.
- D. Use an AWS Snowball device to migrate the data to an Amazon S3 bucket. Create an Amazon EMR cluster to process the data.
Correct answer: C
Explanation
Amazon EMR on AWS Outposts allows the deployment of managed EMR clusters directly on-premises, which successfully keeps the data processing local while minimizing operational complexity and offering cloud-like scalability. Options A, B, and D are incorrect because they involve transferring or accessing data to process it within the AWS public cloud, violating the strict requirement to keep processing on premises.