Databricks Certified Associate Developer for Apache Spark — Question 13
The code block shown below contains an error. The code block is intended to cache DataFrame storesDF only in Spark’s memory and then return the number of rows in the cached DataFrame. Identify the error.
Code block:
storesDF.cache().count()
Answer options
- A. The cache() operation caches DataFrames at the MEMORY_AND_DISK level by default – the storage level must be specified to MEMORY_ONLY as an argument to cache().
- B. The cache() operation caches DataFrames at the MEMORY_AND_DISK level by default – the storage level must be set via storesDF.storageLevel prior to calling cache().
- C. The storesDF DataFrame has not been checkpointed – it must have a checkpoint in order to be cached.
- D. DataFrames themselves cannot be cached – DataFrame storesDF must be cached as a table.
- E. The cache() operation can only cache DataFrames at the MEMORY_AND_DISK level (the default) – persist() should be used instead.
Correct answer: E
Explanation
The correct answer is E because the cache() method defaults to caching DataFrames at the MEMORY_AND_DISK level, which does not meet the requirement of caching only in memory. The other options incorrectly describe either the behavior of the cache() method or additional requirements that do not pertain to the error in the code block.