A company has stored 10 TB of log files in Apache Parquet format in an Amazon S3 bucket.…

Question

A company has stored 10 TB of log files in Apache Parquet format in an Amazon S3 bucket. The company occasionally needs to use SQL to analyze the log files. Which solution will meet these requirements MOST cost-effectively?

Accepted Answer

Correct answer: C. C. Create an AWS Glue crawler to store and retrieve table metadata from the S3 bucket. Use Amazon Athena to run SQL statements directly on the data in the S3 bucket. — Amazon Athena is a serverless, interactive query service that allows users to run SQL queries directly on Amazon S3 data, charging only for the data scanned, which is highly cost-effective for occasional querying. AWS Glue crawlers can automatically discover and catalog the schema of the Apache Parquet files to make them queryable by Athena. The alternative options involving Amazon Aurora, Amazon Redshift, and Amazon EMR require provisioning and paying for running database instances or clusters, making them significantly more expensive for infrequent use.

AWS Certified Solutions Architect – Associate (SAA-C03) — Question 743

Answer options

Correct answer: C

Explanation