A company has an Amazon S3 bucket that contains 1 ТВ of files from different sources. The…

Question

A company has an Amazon S3 bucket that contains 1 ТВ of files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet. An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must store the final output in Amazon S3 so that AWS Glue can consume the output in the future. Which solution will meet these requirements?

Accepted Answer

Correct answer: C. C. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in Apache Parquet format. — Option C is correct because processing each file type separately allows for better data management and ensures that the output can be stored in Apache Parquet format, which is efficient for AWS Glue. Options A and B do not separate the data by type, which could lead to complications in processing. Option D, while separating the data, incorrectly specifies AWS Glue Parquet format, which is not necessary.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 85

Answer options

Correct answer: C

Explanation