Which statement describes the correct use of pyspark.sql.functions.broadcast?

Question

Accepted Answer

Correct answer: D. D. It marks a DataFrame as small enough to store in memory on all executors, allowing a broadcast join. — The correct answer is D because pyspark.sql.functions.broadcast is used to indicate that a DataFrame can fit into memory on all executors, which is essential for optimizing broadcast joins. Options A and B incorrectly refer to columns instead of DataFrames, while C and E misrepresent the function's behavior regarding caching and storage locations.

Databricks Certified Data Engineer Professional — Question 49

Answer options

Correct answer: D

Explanation