Databricks Certified Data Engineer Professional — Question 191

Which statement describes the correct use of pyspark.sql.functions.broadcast?

Answer options

Correct answer: D

Explanation

The correct answer is D because pyspark.sql.functions.broadcast is used to signify that a DataFrame can fit in memory on all executors, enabling efficient broadcast joins. Option A incorrectly describes a column instead of a DataFrame, while option B focuses on a column rather than the DataFrame. Option C misrepresents the function by suggesting it caches the table for future queries, which is not its purpose.