Databricks Certified Associate Developer for Apache Spark — Question 41

The below code block contains a logical error resulting in inefficiency. The code block is intended to efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId. Identify the logical error.
Code block:
storesDF.join(broadcast(employeesDF), "storeId")

Answer options

Correct answer: A

Explanation

The correct answer is A because the intention is to broadcast the smaller DataFrame, storesDF, not the larger employeesDF, which leads to inefficiency. Option B is incorrect because broadcast() can still be necessary in certain scenarios, while C and D misinterpret how the broadcast operation should be applied or configured. Option E is also incorrect as the problem lies with which DataFrame is being broadcasted, not the number of DataFrames being broadcasted.