AWS Certified Data Analytics – Specialty — Question 18
A university intends to use Amazon Kinesis Data Firehose to collect JSON-formatted batches of water quality readings in Amazon S3. The readings are from 50 sensors scattered across a local lake. Students will query the stored data using Amazon Athena to observe changes in a captured metric over time, such as water temperature or acidity. Interest has grown in the study, prompting the university to reconsider how data will be stored.
Which data format and partitioning choices will MOST significantly reduce costs? (Choose two.)
Answer options
- A. Store the data in Apache Avro format using Snappy compression.
- B. Partition the data by year, month, and day.
- C. Store the data in Apache ORC format using no compression.
- D. Store the data in Apache Parquet format using Snappy compression.
- E. Partition the data by sensor, year, month, and day.
Correct answer: B, D
Explanation
Choosing to partition the data by year, month, and day (Option B) optimizes query performance and reduces costs by limiting the amount of data scanned during analysis. Storing the data in Apache Parquet format with Snappy compression (Option D) is efficient for both storage and processing, as it allows for better compression ratios and faster query times compared to uncompressed formats like ORC. The other options either use less efficient data formats or more complex partitioning that would not provide significant cost savings.