AWS Certified Solutions Architect – Professional — Question 625
A company is running a commercial Apache Hadoop cluster on Amazon EC2. This cluster is being used daily to query large files on Amazon S3. The data on
Amazon S3 has been curated and does not require any additional transformations steps. The company is using a commercial business intelligence (BI) tool on
Amazon EC2 to run queries against the Hadoop cluster and visualize the data.
The company wants to reduce or eliminate the overhead costs associated with managing the Hadoop cluster and the BI tool. The company would like to move to a more cost-effective solution with minimal effort. The visualization is simple and requires performing some basic aggregation steps only.
Which option will meet the company's requirements?
Answer options
- A. Launch a transient Amazon EMR cluster daily and develop an Apache Hive script to analyze the files on Amazon S3. Shut down the Amazon EMR cluster when the job is complete. Then use Amazon QuickSight to connect to Amazon EMR and perform the visualization.
- B. Develop a stored procedure invoked from a MySQL database running on Amazon EC2 to analyze the files in Amazon S3. Then use a fast in-memory BI tool running on Amazon EC2 to visualize the data.
- C. Develop a script that uses Amazon Athena to query and analyze the files on Amazon S3. Then use Amazon QuickSight to connect to Athena and perform the visualization.
- D. Use a commercial extract, transform, load (ETL) tool that runs on Amazon EC2 to prepare the data for processing. Then switch to a faster and cheaper BI tool that runs on Amazon EC2 to visualize the data from Amazon S3.
Correct answer: C
Explanation
Amazon Athena is a serverless, interactive query service that allows direct querying of data in Amazon S3 using standard SQL, eliminating the need to manage any Hadoop clusters. Amazon QuickSight is a fully managed, serverless BI service that integrates seamlessly with Athena, fulfilling the requirement to eliminate BI server management overhead on EC2. The other options (A, B, and D) still require managing infrastructure like Amazon EMR or EC2 instances, which does not minimize operational overhead.