Databricks Certified Associate Developer for Apache Spark — Question 102

Which of the following code blocks will always return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle?

Answer options

Correct answer: C

Explanation

The correct answer is C, as coalesce(4) reduces the number of partitions without triggering a shuffle. Options A, B, and D involve repartitioning, which can lead to a shuffle of data. Option E is incomplete as it does not specify the number of partitions.