Databricks Certified Associate Developer for Apache Spark — Question 203
A Spark developer is developing a Spark application to monitor task performance across a cluster. One of the application’s requirements is to track the maximum processing time for tasks on each worker node and consolidate this information on the driver for further analysis.
Which technique should the developer use to achieve this?
Answer options
- A. Use an RDD action like reduce () to compute the maximum time.
- B. Use an accumulator to record the maximum time on the driver.
- C. Broadcast a variable to share the maximum time among workers.
- D. Configure the Spark UI to automatically collect maximum times.
Correct answer: A
Explanation
The correct answer is A because using an RDD action like reduce() allows the developer to compute the maximum processing time directly from the tasks executed on each worker node. Option B, using an accumulator, is not suitable as accumulators are primarily for aggregating values, not for tracking maximums. Option C, broadcasting a variable, does not provide a mechanism to compute maximums dynamically. Option D, configuring the Spark UI, does not directly help in collecting maximum processing times for analysis.