AWS Certified Solutions Architect – Professional — Question 389
A company is running an Apache Hadoop cluster on Amazon EC2 instances. The Hadoop cluster stores approximately 100 TB of data for weekly operational reports and allows occasional access for data scientists to retrieve data. The company needs to reduce the cost and operational complexity for storing and serving this data.
Which solution meets these requirements in the MOST cost-effective manner?
Answer options
- A. Move the Hadoop cluster from EC2 instances to Amazon EMR. Allow data access patterns to remain the same.
- B. Write a script that resizes the EC2 instances to a smaller instance type during downtime and resizes the instances to a larger instance type before the reports are created.
- C. Move the data to Amazon S3 and use Amazon Athena to query the data for reports. Allow the data scientists to access the data directly in Amazon S3.
- D. Migrate the data to Amazon DynamoDB and modify the reports to fetch data from DynamoDB. Allow the data scientists to access the data directly in DynamoDB.
Correct answer: A
Explanation
Migrating the self-managed Apache Hadoop cluster from Amazon EC2 to Amazon EMR significantly reduces operational complexity by utilizing a fully managed service while preserving current data access patterns and tools. Scripting EC2 resizing (Option B) does not eliminate the administrative overhead of managing a raw Hadoop cluster. Although options like Amazon S3 with Athena (Option C) or DynamoDB (Option D) are highly scalable, they would require substantial re-engineering of existing reports and workflows, making Amazon EMR the most straightforward and cost-effective transition.