A company has an application that places hundreds of .csv files into an Amazon S3 bucket…

Question

A company has an application that places hundreds of .csv files into an Amazon S3 bucket every hour. The files are 1 GB in size. Each time a file is uploaded, the company needs to convert the file to Apache Parquet format and place the output file into an S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: D. D. Create an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Parquet format and place the output files into an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the ETL job. — AWS Glue is a fully managed, serverless ETL service that easily handles large-scale file conversions like 1 GB .csv files to Apache Parquet, making Option D the most operationally efficient choice. Option A is unsuitable because downloading and processing 1 GB files inside AWS Lambda may exceed its execution time and memory limits. Option B introduces high operational overhead by requiring the management of Apache Spark infrastructure, while Option C is an overly complex, non-real-time solution involving Athena queries and periodic scheduling.

AWS Certified Solutions Architect – Associate (SAA-C02) — Question 697

Answer options

Correct answer: D

Explanation