AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 41

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.
Which file format will meet these requirements?

Answer options

Correct answer: D

Explanation

Apache Parquet files are optimized for both storage and processing speed, making them ideal for complex data in ML tasks. While CSV, JSON, and gzip formats can be used, they typically do not offer the same level of efficiency as Parquet when it comes to handling large datasets and improving processing times.