Databricks Certified Associate Developer for Apache Spark — Question 38
The code block shown below contains an error. The code block is intended to return a new 12-partition DataFrame from the 8-partition DataFrame storesDF by inducing a shuffle. Identify the error.
Code block:
storesDF.coalesce(12)
Answer options
- A. The coalesce() operation cannot guarantee the number of target partitions – the repartition() operation should be used instead.
- B. The coalesce() operation does not induce a shuffle and cannot increase the number of partitions – the repartition() operation should be used instead.
- C. The coalesce() operation will only work if the DataFrame has been cached to memory – the repartition() operation should be used instead.
- D. The coalesce() operation requires a column by which to partition rather than a number of partitions – the repartition() operation should be used instead.
- E. The number of resulting partitions, 12, is not achievable for an 8-partition DataFrame.
Correct answer: B
Explanation
The correct answer is B because the coalesce() operation is designed to reduce the number of partitions and does not induce a shuffle, making it unsuitable for increasing the number of partitions. Options A, C, D, and E are incorrect as they misrepresent the functionality of coalesce() regarding partitioning and shuffling.