Databricks Certified Associate Developer for Apache Spark — Question 162

Which of the following operations will always return a new DataFrame with updated partitions from DataFrame storesDF by inducing a shuffle?

Answer options

Correct answer: C

Explanation

The correct answer is C, storesDF.repartition(), because it specifically induces a shuffle and can change the number of partitions in the DataFrame. Options A, D, and E do not guarantee a shuffle; coalesce() reduces partitions without shuffling, union() and intersect() do not modify the partitioning in a way that leads to a new DataFrame with updated partitions.