Databricks Certified Associate Developer for Apache Spark — Question 179

A data engineer has been inspecting one of their scheduled Spark applications which has started running slowly on the most recent executions. Looking at the query plan the engineer notices that Adaptive Query Execution (AQE) is no longer converting one of their join operations from a sort-merge join to a broadcast join, causing extra shuffling in the Spark job.

What explains the change in the behaviour of the application?

Answer options

Correct answer: A

Explanation

The correct answer is A because if the smallest table surpasses the configured threshold, AQE will not convert the join to a broadcast join, leading to extra shuffling. Option B is incorrect as stale statistics would typically lead to less optimal plans but wouldn't directly cause this specific behavior. Option C is incorrect because a version upgrade might not necessarily disable AQE, and Option D is wrong since skewed keys affect performance but wouldn't directly lead to the change in join type.