Databricks Certified Associate Developer for Apache Spark — Question 162
Which of the following operations will always return a new DataFrame with updated partitions from DataFrame storesDF by inducing a shuffle?
Answer options
- A. storesDF.coalesce()
- B. storesDF.rdd.getNumPartitions()
- C. storesDF.repartition()
- D. storesDF.union()
- E. storesDF.intersect()
Correct answer: C
Explanation
The correct answer is C, storesDF.repartition(), because it specifically induces a shuffle and can change the number of partitions in the DataFrame. Options A, D, and E do not guarantee a shuffle; coalesce() reduces partitions without shuffling, union() and intersect() do not modify the partitioning in a way that leads to a new DataFrame with updated partitions.