Which of the following operations is most likely to result in a shuffle?

Question

Accepted Answer

Correct answer: A. A. DataFrame.join() — The correct answer is A, DataFrame.join(), because joining two DataFrames often requires redistributing data across partitions, leading to a shuffle. The other options, such as filter, where, and drop, typically do not require data movement between partitions and hence do not usually result in a shuffle.

Databricks Certified Associate Developer for Apache Spark — Question 18

Answer options

Correct answer: A

Explanation