Databricks Certified Associate Developer for Apache Spark — Question 203

A Spark developer is developing a Spark application to monitor task performance across a cluster. One of the application’s requirements is to track the maximum processing time for tasks on each worker node and consolidate this information on the driver for further analysis.

Which technique should the developer use to achieve this?

Answer options

Correct answer: A

Explanation

The correct answer is A because using an RDD action like reduce() allows the developer to compute the maximum processing time directly from the tasks executed on each worker node. Option B, using an accumulator, is not suitable as accumulators are primarily for aggregating values, not for tracking maximums. Option C, broadcasting a variable, does not provide a mechanism to compute maximums dynamically. Option D, configuring the Spark UI, does not directly help in collecting maximum processing times for analysis.