Databricks Certified Data Engineer Professional — Question 99
Which indicators would you look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally? Assume you are using Spark’s MEMORY_ONLY storage level.
Answer options
- A. Size on Disk is < Size in Memory
- B. The RDD Block Name includes the “*” annotation signaling a failure to cache
- C. Size on Disk is > 0
- D. The number of Cached Partitions > the number of Spark Partitions
- E. On Heap Memory Usage is within 75% of Off Heap Memory Usage
Correct answer: C
Explanation
The correct answer is C, as a Size on Disk greater than zero indicates that the data is being cached but may not be fitting optimally in memory. Option A is incorrect because ideally, Size on Disk should not be less than Size in Memory for optimal performance. Options B, D, and E highlight other issues but do not directly indicate suboptimal performance related to caching in MEMORY_ONLY mode.