AWS Certified Data Engineer – Associate (DEA-C01) — Question 136

A company receives test results from testing facilities that are located around the world. The company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process the files, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is increasing. The data engineer must reduce the data processing time.

Which solution will MOST reduce the data processing time?

Answer options

Correct answer: B

Explanation

The correct answer is B because using the AWS Glue dynamic frame file-grouping option allows for efficient ingestion and processing of multiple small files, which minimizes overhead and speeds up processing time. Option A involves additional steps that can add latency, Option C does not address the conversion to Apache Parquet format, and Option D may not provide the same level of integration and efficiency as AWS Glue in this scenario.