Databricks Certified Associate Developer for Apache Spark — Question 77
The code block shown below contains an error. The code block is intended to adjust the number of partitions used in wide transformations like join() to 32. Identify the error.
Code block:
spark.conf.set("spark.default.parallelism", "32")
Answer options
- A. spark.default.parallelism is not the right Spark configuration parameter – spark.sql.shuffle.partitions should be used instead.
- B. There is no way to adjust the number of partitions used in wide transformations – it defaults to the number of total CPUs in the cluster.
- C. Spark configuration parameters cannot be set in runtime.
- D. Spark configuration parameters are not set with spark.conf.set().
- E. The second argument should not be the string version of "32" – it should be the integer 32.
Correct answer: A
Explanation
The correct answer is A because the parameter spark.default.parallelism is not used for controlling the number of partitions in wide transformations; instead, spark.sql.shuffle.partitions is the appropriate parameter. Options B, C, D, and E are incorrect as they misrepresent Spark's capabilities and configuration methods.