A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data…

Question

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.
The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.
Which solution will provide the HIGHEST performance for data retrieval?

Accepted Answer

Correct answer: C. C. Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval. — Option C is the best choice because partitioning the data by date allows Amazon Athena to efficiently query only the relevant partitions, resulting in faster data retrieval. The other options either do not utilize partitioning (A, D) or introduce unnecessary complexity without performance benefits (B).

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 46

Answer options

Correct answer: C

Explanation