Databricks Certified Associate Developer for Apache Spark — Question 52
The default value of spark.sql.shuffle.partitions is 200. Which of the following describes what that means?
Answer options
- A. By default, all DataFrames in Spark will be spit to perfectly fill the memory of 200 executors.
- B. By default, new DataFrames created by Spark will be split to perfectly fill the memory of 200 executors.
- C. By default, Spark will only read the first 200 partitions of DataFrames to improve speed.
- D. By default, all DataFrames in Spark, including existing DataFrames, will be split into 200 unique segments for parallelization.
- E. By default, DataFrames will be split into 200 unique partitions when data is being shuffled.
Correct answer: E
Explanation
The correct answer is E because it accurately describes that during the shuffling process, DataFrames are split into 200 partitions to allow for parallel processing. Options A, B, C, and D incorrectly imply that this setting pertains to executor memory usage, only reading a limited number of partitions, or applies to existing DataFrames rather than focusing on the shuffling of data.