Databricks Certified Associate Developer for Apache Spark — Question 37
Which of the following operations can be used to return a new DataFrame from DataFrame storesDF without inducing a shuffle?
Answer options
- A. storesDF.intersect()
- B. storesDF.repartition(1)
- C. storesDF.union()
- D. storesDF.coalesce(1)
- E. storesDF.rdd.getNumPartitions()
Correct answer: D
Explanation
The correct answer is D, as the coalesce operation reduces the number of partitions without causing a shuffle, maintaining the existing data locality. In contrast, options A, B, and C could induce a shuffle due to their nature of merging or partitioning data differently. Option E simply returns the number of partitions and does not create a new DataFrame.