A company wants to measure the effectiveness of its recent marketing campaigns. The compa…

Question

A company wants to measure the effectiveness of its recent marketing campaigns. The company performs batch processing on .csv files of sales data and stores the results in an Amazon S3 bucket once every hour. The S3 bucket contains petabytes of objects. The company runs one-time queries in Amazon Athena to determine which products are most popular on a particular date for a particular region. Queries sometimes fail or take longer than expected to finish running.
Which actions should a solutions architect take to improve the query performance and reliability? (Choose two.)

Accepted Answer

Correct answer: A, B. A. Reduce the S3 object sizes to less than 128 MB. — B. Partition the data by date and region in Amazon S3. — Partitioning the data in Amazon S3 by date and region (Option B) restricts Athena's scan to only the relevant folders, drastically reducing query times and costs. Keeping file sizes optimized to under 128 MB (Option A) prevents performance bottlenecks associated with processing massive individual files, allowing Athena to read data in parallel more efficiently. Storing data as single giant files (Option C) prevents parallel processing, and using Kinesis Data Analytics (Option D) is designed for real-time streaming rather than querying petabytes of historical batch data.

AWS Certified Solutions Architect – Associate (SAA-C02) — Question 588

Answer options

Correct answer: A, B

Explanation