AWS Certified Data Engineer – Associate (DEA-C01) — Question 6

A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.
Which solution will MOST speed up the Athena query performance?

Answer options

Correct answer: C

Explanation

The correct answer is C because Apache Parquet is a columnar storage format that optimizes query performance, especially for column-specific queries, and applying Snappy compression further reduces the data size for faster access. Option A is incorrect as JSON does not provide the same performance benefits as Parquet for columnar queries. Option B only compresses data without changing the format, which does not enhance query speed significantly. Option D, while it compresses the files, gzip is generally less efficient for performance compared to Snappy with a columnar format.