Databricks Certified Associate Developer for Apache Spark — Question 18
Which of the following operations is most likely to result in a shuffle?
Answer options
- A. DataFrame.join()
- B. DataFrame.filter()
- C. DataFrame.union()
- D. DataFrame.where()
- E. DataFrame.drop()
Correct answer: A
Explanation
The correct answer is A, DataFrame.join(), because joining two DataFrames often requires redistributing data across partitions, leading to a shuffle. The other options, such as filter, where, and drop, typically do not require data movement between partitions and hence do not usually result in a shuffle.