AWS Certified Data Analytics – Specialty — Question 117

A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .csv and JSON files in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained indefinitely for compliance requirements.
Which solution meets the company's requirements?

Answer options

Correct answer: A

Explanation

Option A is the correct solution because it utilizes AWS Glue ETL to compress and partition the data into a columnar format, optimizing query performance with Athena. Additionally, the lifecycle policies correctly transition processed data to S3 Standard-IA after 5 years and archive raw data to S3 Glacier after 7 days, meeting compliance requirements. The other options either use a row-based format or incorrect timing for lifecycle policies, which do not maximize cost efficiency and performance.