Databricks Certified Associate Developer for Apache Spark — Question 3

Of the following situations, in which will it be most advantageous to store DataFrame df at the MEMORY_AND_DISK storage level rather than the MEMORY_ONLY storage level?

Answer options

Correct answer: D

Explanation

The correct answer is D because when data cannot fit into memory, it can be more efficient to read it from disk rather than recompute it, especially if the logical plan for recomputation is complex. Options A, B, and C do not support using MEMORY_AND_DISK since they either imply full memory usage or faster recomputation. Option E is incorrect as it ignores scenarios where data may not fit into memory.