AWS Certified Data Engineer – Associate (DEA-C01) — Question 80

A company stores 10 to 15 TB of uncompressed .csv files in Amazon S3. The company is evaluating Amazon Athena as a one-time query engine.

The company wants to transform the data to optimize query runtime and storage costs.

Which file format and compression solution will meet these requirements for Athena queries?

Answer options

Correct answer: C

Explanation

The correct answer is C, as Apache Parquet is a columnar storage format that is optimized for performance and storage efficiency in analytical queries, and Snappy compression provides a good balance between speed and compression ratio. Options A and B use formats that are not optimized for Athena queries, and D, while a valid choice, does not offer the same level of performance optimization as Parquet with Snappy.