AWS Certified Solutions Architect – Professional — Question 218
A company wants to analyze log data using date ranges with a custom application running on AWS. The application generates about 10 GB of data every day, which is expected to grow. A Solutions Architect is tasked with storing the data in Amazon S3 and using Amazon Athena to analyze the data.
Which combination of steps will ensure optimal performance as the data grows? (Choose two.)
Answer options
- A. Store each object in Amazon S3 with a random string at the front of each key.
- B. Store the data in multiple S3 buckets.
- C. Store the data in Amazon S3 in a columnar format, such as Apache Parquet or Apache ORC.
- D. Store the data in Amazon S3 in objects that are smaller than 10 MB.
- E. Store the data using Apache Hive partitioning in Amazon S3 using a key that includes a date, such as dt=2019-02.
Correct answer: C, E
Explanation
The correct answers, C and E, are optimal because storing data in a columnar format like Apache Parquet or Apache ORC improves query performance and reduces storage costs due to better compression. Additionally, using Apache Hive partitioning by date allows Athena to scan only relevant data, enhancing query efficiency. Options A, B, and D do not provide the same level of performance optimization for large-scale data analysis.