AWS Certified Solutions Architect – Professional — Question 725
A company deploys workloads in multiple AWS accounts. Each account has a VPC with VPC flow logs published in text log format to a centralized Amazon S3 bucket. Each log file is compressed with gzip compression. The company must retain the log files indefinitely.
A security engineer occasionally analyzes the logs by using Amazon Athena to query the VPC flow logs. The query performance is degrading over time as the number of ingested logs is growing. A solutions architect must improve the performance of the log analysis and reduce the storage space that the VPC flow logs use.
Which solution will meet these requirements with the LARGEST performance improvement?
Answer options
- A. Create an AWS Lambda function to decompress the gzip files and to compress the files with bzip2 compression. Subscribe the Lambda function to an s3:ObjectCreated:Put S3 event notification for the S3 bucket.
- B. Enable S3 Transfer Acceleration for the S3 bucket. Create an S3 Lifecycle configuration to move files to the S3 Intelligent-Tiering storage class as soon as the files are uploaded.
- C. Update the VPC flow log configuration to store the files in Apache Parquet format. Specify hourly partitions for the log files.
- D. Create a new Athena workgroup without data usage control limits. Use Athena engine version 2.
Correct answer: C
Explanation
Configuring VPC flow logs to output directly in Apache Parquet format dramatically improves Amazon Athena query performance because Parquet is a columnar storage format that reduces both S3 storage footprints and the volume of data scanned during queries. Additionally, implementing hourly partitioning allows Athena to scan only the relevant subsets of data, yielding the highest performance gains. Other methods, such as utilizing bzip2 compression or altering S3 storage classes, do not optimize the query execution path as effectively as columnar formatting and partitioning.