A company analyzes historical data and needs to query data that is stored in Amazon S3. N…

Question

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3.
The company's analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data. The amount of data that is ingested into
Amazon S3 has increased substantially over time, and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)

Accepted Answer

Correct answer: C, E. C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis. — E. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data. — Option C is correct because using AWS Glue to convert data to Apache Parquet format and partition it can significantly enhance query performance in Athena. Option E, while it suggests compression, does not provide the same level of performance improvement as partitioning and converting to a columnar format like Parquet. Options A, B, and D do not offer effective solutions for improving query performance in this scenario.

AWS Certified Data Analytics – Specialty — Question 111

Answer options

Correct answer: C, E

Explanation