A company wants to analyze log data using date ranges with a custom application running o…

Question

A company wants to analyze log data using date ranges with a custom application running on AWS. The application generates about 10 GB of data every day, which is expected to grow. A Solutions Architect is tasked with storing the data in Amazon S3 and using Amazon Athena to analyze the data.
Which combination of steps will ensure optimal performance as the data grows? (Choose two.)

Accepted Answer

Correct answer: C, E. C. Store the data in Amazon S3 in a columnar format, such as Apache Parquet or Apache ORC. — E. Store the data using Apache Hive partitioning in Amazon S3 using a key that includes a date, such as dt=2019-02. — The correct answers, C and E, are optimal because storing data in a columnar format like Apache Parquet or Apache ORC improves query performance and reduces storage costs due to better compression. Additionally, using Apache Hive partitioning by date allows Athena to scan only relevant data, enhancing query efficiency. Options A, B, and D do not provide the same level of performance optimization for large-scale data analysis.

AWS Certified Solutions Architect – Professional — Question 218

Answer options

Correct answer: C, E

Explanation