A company has significantly increased the amount of data that is stored as .csv files in…

Question

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take. An ML engineer must implement a solution to optimize the data for query performance. Which solution will meet this requirement with the LEAST operational overhead?

Accepted Answer

Correct answer: C. C. Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format. — The correct answer is C because converting .csv files to Apache Parquet format optimizes storage and enhances query performance due to Parquet's efficient columnar storage. Options A and B do not address the need for overall data optimization for query performance, and option D introduces more operational overhead by requiring the management of an EMR cluster.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 173

Answer options

Correct answer: C

Explanation