Databricks Certified Associate Developer for Apache Spark — Question 37

Which of the following operations can be used to return a new DataFrame from DataFrame storesDF without inducing a shuffle?

Answer options

Correct answer: D

Explanation

The correct answer is D, as the coalesce operation reduces the number of partitions without causing a shuffle, maintaining the existing data locality. In contrast, options A, B, and C could induce a shuffle due to their nature of merging or partitioning data differently. Option E simply returns the number of partitions and does not create a new DataFrame.