AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 46

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.
The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.
Which solution will provide the HIGHEST performance for data retrieval?

Answer options

Correct answer: C

Explanation

Option C is the best choice because partitioning the data by date allows Amazon Athena to efficiently query only the relevant partitions, resulting in faster data retrieval. The other options either do not utilize partitioning (A, D) or introduce unnecessary complexity without performance benefits (B).