Databricks Certified Associate Developer for Apache Spark — Question 73

Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?

Answer options

Correct answer: A

Explanation

The correct answer is A because DataFrame.repartition(n) creates n new partitions with balanced data distribution, while DataFrame.coalesce(n) merges partitions quickly but may lead to uneven data distribution. Options B, C, D, and E provide inaccurate descriptions of the functionality and efficiency of these methods.