AWS Certified Data Analytics – Specialty — Question 18

A university intends to use Amazon Kinesis Data Firehose to collect JSON-formatted batches of water quality readings in Amazon S3. The readings are from 50 sensors scattered across a local lake. Students will query the stored data using Amazon Athena to observe changes in a captured metric over time, such as water temperature or acidity. Interest has grown in the study, prompting the university to reconsider how data will be stored.
Which data format and partitioning choices will MOST significantly reduce costs? (Choose two.)

Answer options

Correct answer: B, D

Explanation

Choosing to partition the data by year, month, and day (Option B) optimizes query performance and reduces costs by limiting the amount of data scanned during analysis. Storing the data in Apache Parquet format with Snappy compression (Option D) is efficient for both storage and processing, as it allows for better compression ratios and faster query times compared to uncompressed formats like ORC. The other options either use less efficient data formats or more complex partitioning that would not provide significant cost savings.