AWS Certified Solutions Architect – Associate (SAA-C02) — Question 588
A company wants to measure the effectiveness of its recent marketing campaigns. The company performs batch processing on .csv files of sales data and stores the results in an Amazon S3 bucket once every hour. The S3 bucket contains petabytes of objects. The company runs one-time queries in Amazon Athena to determine which products are most popular on a particular date for a particular region. Queries sometimes fail or take longer than expected to finish running.
Which actions should a solutions architect take to improve the query performance and reliability? (Choose two.)
Answer options
- A. Reduce the S3 object sizes to less than 128 MB.
- B. Partition the data by date and region in Amazon S3.
- C. Store the files as large, single objects in Amazon S3.
- D. Use Amazon Kinesis Data Analytics to run the queries as part of the batch processing operation.
- E. Use an AWS Glue extract, transform, and load (ETL) process to convert the .csv files into Apache Parquet format.
Correct answer: A, B
Explanation
Partitioning the data in Amazon S3 by date and region (Option B) restricts Athena's scan to only the relevant folders, drastically reducing query times and costs. Keeping file sizes optimized to under 128 MB (Option A) prevents performance bottlenecks associated with processing massive individual files, allowing Athena to read data in parallel more efficiently. Storing data as single giant files (Option C) prevents parallel processing, and using Kinesis Data Analytics (Option D) is designed for real-time streaming rather than querying petabytes of historical batch data.