AWS Certified Data Engineer – Associate (DEA-C01) — Question 235

A company processes 500 GB of audience and advertising data daily, storing CSV files in Amazon S3 with schemas registered in AWS Glue Data Catalog. They need to convert these files to Apache Parquet format and store them in an S3 bucket.

The solution requires a long-running workflow with 15 GiB memory capacity to process the data concurrently, followed by a correlation process that begins only after the first two processes complete.

Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: C

Explanation

The correct answer is C because AWS Glue workflows are specifically designed for data processing tasks and allow for running multiple processes in parallel, which meets the requirement for concurrent processing with minimal operational overhead. Option A, while viable, introduces unnecessary complexity by using Amazon MWAA. Option B adds extra components with Amazon EMR and SQS, increasing operational overhead. Option D relies on AWS Step Functions and Lambda, which may not be as efficient for this specific data processing scenario.