AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 85

A company has an Amazon S3 bucket that contains 1 ТВ of files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet.

An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must store the final output in Amazon S3 so that AWS Glue can consume the output in the future.

Which solution will meet these requirements?

Answer options

Correct answer: C

Explanation

Option C is correct because processing each file type separately allows for better data management and ensures that the output can be stored in Apache Parquet format, which is efficient for AWS Glue. Options A and B do not separate the data by type, which could lead to complications in processing. Option D, while separating the data, incorrectly specifies AWS Glue Parquet format, which is not necessary.