AWS Certified Data Engineer – Associate (DEA-C01) — Question 230

A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) job. The job contains a sort-merge join of two large and equally sized DataFrames.

The job is failing with the following error: No space left on device.

Which solution will resolve the error?

Answer options

Correct answer: A

Explanation

The correct answer is A because using the AWS Glue Spark shuffle manager can effectively manage memory and disk space during the shuffle phase of the job, which is crucial for operations like sort-merge joins. Option B, while it may provide additional storage, does not address the shuffle management issue. Option C would not be suitable for large DataFrames where a broadcast join may lead to further memory issues. Option D also does not solve the underlying space problem as it merely changes the data structure without optimizing the join process.