Databricks Certified Associate Developer for Apache Spark — Question 201
A data engineer is investigating a Spark cluster that is experiencing underutilization during scheduled batch jobs. After checking the Spark logs, they noticed that tasks are often getting killed due to timeout errors, and there are several warnings about insufficient resources in the logs.
Which action should the engineer take to resolve the underutilization issue?
Answer options
- A. Increase the executor memory allocation in the Spark configuration.
- B. Set the spark.network.timeout property to allow tasks more time to complete without being killed.
- C. Increase the number of executor instances to handle more concurrent tasks.
- D. Reduce the size of the data partition to improve task scheduling.
Correct answer: C
Explanation
Increasing the number of executor instances allows for more concurrent tasks to be processed, addressing the underutilization issue directly. While increasing memory or adjusting timeouts may help with individual task performance, they do not directly resolve the capacity limitation. Reducing partition size could improve scheduling but does not necessarily address the core issue of underutilization.