Databricks Certified Associate Developer for Apache Spark — Question 165
Which of the following Spark properties is used to configure whether DataFrames found to be below a certain size threshold at runtime will be automatically broadcasted?
Answer options
- A. spark.sql.broadcastTimeout
- B. spark.sql.autoBroadcastJoinThreshold
- C. spark.sql.shuffle.partitions
- D. spark.sql.inMemoryColumnarStorage.batchSize
- E. spark.sql.adaptive.localShuffleReader.enabled
Correct answer: B
Explanation
The correct answer is B, as 'spark.sql.autoBroadcastJoinThreshold' specifically controls the automatic broadcasting of DataFrames based on their size. Option A relates to the timeout for broadcasting, option C pertains to the number of shuffle partitions, option D specifies the batch size for in-memory columnar storage, and option E is about enabling local shuffle reading, none of which influence the broadcasting behavior directly.