Databricks Certified Associate Developer for Apache Spark — Question 144
Which of the following operations is least likely to result in a shuffle?
Answer options
- A. DataFrame.join()
- B. DataFrame.fliter()
- C. DataFrame.orderBy()
- D. DataFrame.distinct()
- E. DataFrame.intersect()
Correct answer: B
Explanation
The correct answer is B, DataFrame.fliter(), as filtering data typically does not require reshuffling the entire dataset. In contrast, operations like join, orderBy, distinct, and intersect usually necessitate shuffling to align or combine data from different partitions.