Databricks Certified Associate Developer for Apache Spark — Question 161
Which of the following code blocks attempts to cache the partitions of DataFrame storesDF only in Spark’s memory?
Answer options
- A. storesDF.cache(StorageLevel.MEMORY_ONLY).count()
- B. storesDF.persist().count()
- C. storesDF.cache().count()
- D. storesDF.persist(StorageLevel.MEMORY_ONLY).count()
- E. storesDF.persist("MEMORY_ONLY").count()
Correct answer: D
Explanation
The correct answer is D because it explicitly uses StorageLevel.MEMORY_ONLY to store the DataFrame partitions in memory. Option A also uses MEMORY_ONLY, but it is not the most appropriate method for this purpose. Options B and C do not specify the storage level, which means they may not cache the data only in memory. Option E uses a string instead of the StorageLevel enum, making it less precise.