AWS Certified Data Engineer – Associate (DEA-C01) — Question 212

A data engineer develops an AWS Glue Apache Spark ETL job to perform transformations on a dataset. When the data engineer runs the job, the job returns an error that reads, “No space left on device.”

The data engineer needs to identify the source of the error and provide a solution.

Which combinations of steps will meet this requirement MOST cost-effectively? (Choose two.)

Answer options

Correct answer: B, D

Explanation

Option B is correct as using the Spark UI and AWS Glue metrics allows for effective monitoring of data skew, which can help identify the root cause of the error. Option D is also correct since enabling the --write-shuffle-files-to-s3 job parameter can alleviate storage issues by offloading shuffle files to S3. The other options either do not effectively resolve the issue or are not the most cost-efficient solutions.