AWS Certified Data Analytics – Specialty — Question 156
A company has 10-15 ׀¢׀’ of uncompressed .csv files in Amazon S3. The company is evaluating Amazon Athena as a one-time query engine. The company wants to transform the data to optimize query runtime and storage costs.
Which option for data format and compression meets these requirements?
Answer options
- A. CSV compressed with zip
- B. JSON compressed with bzip2
- C. Apache Parquet compressed with Snappy
- D. Apache Avro compressed with LZO
Correct answer: C
Explanation
The correct answer is C, as Apache Parquet is a columnar storage format that enhances query performance and reduces storage costs, especially when combined with Snappy compression. Options A and B do not provide the same level of optimization for querying and storage as Parquet, while D, although efficient, is not as widely preferred in this context as Parquet with Snappy.