AWS Certified Solutions Architect – Professional — Question 725

A company deploys workloads in multiple AWS accounts. Each account has a VPC with VPC flow logs published in text log format to a centralized Amazon S3 bucket. Each log file is compressed with gzip compression. The company must retain the log files indefinitely.

A security engineer occasionally analyzes the logs by using Amazon Athena to query the VPC flow logs. The query performance is degrading over time as the number of ingested logs is growing. A solutions architect must improve the performance of the log analysis and reduce the storage space that the VPC flow logs use.

Which solution will meet these requirements with the LARGEST performance improvement?

Answer options

Correct answer: C

Explanation

Configuring VPC flow logs to output directly in Apache Parquet format dramatically improves Amazon Athena query performance because Parquet is a columnar storage format that reduces both S3 storage footprints and the volume of data scanned during queries. Additionally, implementing hourly partitioning allows Athena to scan only the relevant subsets of data, yielding the highest performance gains. Other methods, such as utilizing bzip2 compression or altering S3 storage classes, do not optimize the query execution path as effectively as columnar formatting and partitioning.