AWS Certified Solutions Architect – Associate (SAA-C03) — Question 743
A company has stored 10 TB of log files in Apache Parquet format in an Amazon S3 bucket. The company occasionally needs to use SQL to analyze the log files.
Which solution will meet these requirements MOST cost-effectively?
Answer options
- A. Create an Amazon Aurora MySQL database. Migrate the data from the S3 bucket into Aurora by using AWS Database Migration Service (AWS DMS). Issue SQL statements to the Aurora database.
- B. Create an Amazon Redshift cluster. Use Redshift Spectrum to run SQL statements directly on the data in the S3 bucket.
- C. Create an AWS Glue crawler to store and retrieve table metadata from the S3 bucket. Use Amazon Athena to run SQL statements directly on the data in the S3 bucket.
- D. Create an Amazon EMR cluster. Use Apache Spark SQL to run SQL statements directly on the data in the S3 bucket.
Correct answer: C
Explanation
Amazon Athena is a serverless, interactive query service that allows users to run SQL queries directly on Amazon S3 data, charging only for the data scanned, which is highly cost-effective for occasional querying. AWS Glue crawlers can automatically discover and catalog the schema of the Apache Parquet files to make them queryable by Athena. The alternative options involving Amazon Aurora, Amazon Redshift, and Amazon EMR require provisioning and paying for running database instances or clusters, making them significantly more expensive for infrequent use.