Databricks Certified Associate Developer for Apache Spark — Question 144

Which of the following operations is least likely to result in a shuffle?

Answer options

Correct answer: B

Explanation

The correct answer is B, DataFrame.fliter(), as filtering data typically does not require reshuffling the entire dataset. In contrast, operations like join, orderBy, distinct, and intersect usually necessitate shuffling to align or combine data from different partitions.