Which statement describes the correct use of pyspark.sql.functions.broadcast?

Question

Accepted Answer

Correct answer: D. D. It marks a DataFrame as small enough to store in memory on all executors, allowing a broadcast join. — The correct answer is D because pyspark.sql.functions.broadcast is used to signify that a DataFrame can fit in memory on all executors, enabling efficient broadcast joins. Option A incorrectly describes a column instead of a DataFrame, while option B focuses on a column rather than the DataFrame. Option C misrepresents the function by suggesting it caches the table for future queries, which is not its purpose.

Databricks Certified Data Engineer Professional — Question 191

Answer options

Correct answer: D

Explanation