AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 173

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Answer options

Correct answer: C

Explanation

The correct answer is C because converting .csv files to Apache Parquet format optimizes storage and enhances query performance due to Parquet's efficient columnar storage. Options A and B do not address the need for overall data optimization for query performance, and option D introduces more operational overhead by requiring the management of an EMR cluster.