Databricks Certified Associate Developer for Apache Spark — Question 18

Which of the following operations is most likely to result in a shuffle?

Answer options

Correct answer: A

Explanation

The correct answer is A, DataFrame.join(), because joining two DataFrames often requires redistributing data across partitions, leading to a shuffle. The other options, such as filter, where, and drop, typically do not require data movement between partitions and hence do not usually result in a shuffle.