Databricks Certified Associate Developer for Apache Spark — Question 216
Which of the following Spark properties is used to configure the maximum size of an automatically broadcasted DataFrame when performing a join?
Answer options
- A. spark.sql.broadcastTimeout
- B. spark.sql.autoBroadcastJoinThreshold
- C. spark.sql.shuffle.partitions
- D. spark.sql.inMemoryColumnarStorage.batchSize
- E. spark.sql.adaptive.skewedJoin.enabled
Correct answer: B
Explanation
The correct answer, spark.sql.autoBroadcastJoinThreshold, is specifically designed to control the maximum size of a DataFrame for automatic broadcasting during joins. The other options relate to different configurations, such as timeout settings, partitioning, and in-memory storage, which do not directly affect the broadcast size.