The below code block contains a logical error resulting in inefficiency. The code block i…

Question

The below code block contains a logical error resulting in inefficiency. The code block is intended to efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId. Identify the logical error.
Code block:
storesDF.join(broadcast(employeesDF), "storeId")

Accepted Answer

Correct answer: A. A. The larger DataFrame employeesDF is being broadcasted rather than the smaller DataFrame storesDF. — The correct answer is A because the intention is to broadcast the smaller DataFrame, storesDF, not the larger employeesDF, which leads to inefficiency. Option B is incorrect because broadcast() can still be necessary in certain scenarios, while C and D misinterpret how the broadcast operation should be applied or configured. Option E is also incorrect as the problem lies with which DataFrame is being broadcasted, not the number of DataFrames being broadcasted.

Databricks Certified Associate Developer for Apache Spark — Question 41

Answer options

Correct answer: A

Explanation