Databricks Certified Associate Developer for Apache Spark — Question 46
Which of the following operations can be used to create a new DataFrame that has 12 partitions from an original DataFrame df that has 8 partitions?
Answer options
- A. df.repartition(12)
- B. df.cache()
- C. df.partitionBy(1.5)
- D. df.coalesce(12)
- E. df.partitionBy(12)
Correct answer: A
Explanation
The correct answer, A, uses the df.repartition(12) method, which allows you to specify the exact number of partitions for the new DataFrame. Option B, df.cache(), is for caching the DataFrame in memory and does not affect partitioning. Option C, df.partitionBy(1.5), is invalid as partitioning requires an integer. Option D, df.coalesce(12), is used to reduce the number of partitions, not increase them, and E, df.partitionBy(12), is meant for writing data to disk rather than changing existing DataFrame partitions.