Databricks Certified Associate Developer for Apache Spark — Question 102
Which of the following code blocks will always return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle?
Answer options
- A. storesDF.repartition(4, "sqft")
- B. storesDF.repartition()
- C. storesDF.coalesce(4)
- D. storesDF.repartition(4)
- E. storesDF.coalesce
Correct answer: C
Explanation
The correct answer is C, as coalesce(4) reduces the number of partitions without triggering a shuffle. Options A, B, and D involve repartitioning, which can lead to a shuffle of data. Option E is incomplete as it does not specify the number of partitions.