AWS Certified Data Analytics – Specialty — Question 102

A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost- effective solution.
Which approach meets these requirements for optimizing and querying the log data?

Answer options

Correct answer: D

Explanation

Option D is the correct answer because using AWS Glue to transform logs into Apache Parquet format and partitioning them is efficient for both querying and storing data in a cost-effective manner. Options A and C use formats that are less optimized for analytical queries compared to Parquet, while option B involves a long-running EMR cluster, which increases operational overhead and costs.