A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3.…

Question

A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost- effective solution.
Which approach meets these requirements for optimizing and querying the log data?

Accepted Answer

Correct answer: D. D. Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data. — Option D is the correct answer because using AWS Glue to transform logs into Apache Parquet format and partitioning them is efficient for both querying and storing data in a cost-effective manner. Options A and C use formats that are less optimized for analytical queries compared to Parquet, while option B involves a long-running EMR cluster, which increases operational overhead and costs.

AWS Certified Data Analytics – Specialty — Question 102

Answer options

Correct answer: D

Explanation