Databricks Certified Associate Developer for Apache Spark — Question 171
The code block shown below should return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
_1_._2_(_3_)
Answer options
- A. 1. storesDF 2. coalesce 3. Nothing
- B. 1. storesDF 2. coalesce 3. 4
- C. 1. storesDF 2. coalesce 3. 4, "storeId"
- D. 1. storesDF 2. coalesce 3. "storeId"
Correct answer: B
Explanation
The correct answer is B because the coalesce function is used to reduce the number of partitions without triggering a shuffle, and specifying 4 as the argument will create a DataFrame with 4 partitions. Option A is incorrect because 'Nothing' is not a valid argument, while options C and D incorrectly include an additional parameter that is not needed for simply reducing the partition count.