AWS Certified Machine Learning – Specialty — Question 59

A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance.
How should the records be stored in Amazon S3 to improve query performance?

Answer options

Correct answer: B

Explanation

Using Parquet files is optimal for storing data in Amazon S3 as they are columnar storage formats that allow for efficient querying and compression, leading to improved performance. In contrast, CSV files and Compressed JSON do not provide the same level of efficiency for large datasets, and RecordIO is not as widely supported for querying with Amazon Athena.