Databricks Certified Associate Developer for Apache Spark — Question 175
A Spark engineer is troubleshooting a Spark application that has been encountering out-of-memory errors during execution. By reviewing the Spark driver logs, the engineer notices multiple “GC overhead limit exceeded” messages.
Which action should the engineer take to resolve this issue?
Answer options
- A. Optimize the data processing logic by repartitioning the DataFrame.
- B. Modify the Spark configuration to disable garbage collection.
- C. Increase the memory allocated to the Spark Driver.
- D. Cache large DataFrames to persist them in memory.
Correct answer: C
Explanation
Increasing the memory allocated to the Spark Driver helps accommodate larger datasets and reduces the chances of running into out-of-memory errors. The other options, while potentially beneficial in certain contexts, do not directly address the memory allocation issue that is causing the 'GC overhead limit exceeded' messages.