Databricks Certified Data Engineer Professional — Question 114
The data engineer is using Spark's MEMORY_ONLY storage level.
Which indicators should the data engineer look for in the Spark UI's Storage tab to signal that a cached table is not performing optimally?
Answer options
- A. On Heap Memory Usage is within 75% of Off Heap Memory Usage
- B. The RDD Block Name includes the “*” annotation signaling a failure to cache
- C. Size on Disk is > 0
- D. The number of Cached Partitions > the number of Spark Partitions
Correct answer: C
Explanation
The correct answer is C because if the Size on Disk is greater than 0, it indicates that the data is spilling to disk, which is a sign of suboptimal caching. The other options do not directly indicate caching inefficiencies; for instance, A and D may not necessarily reflect performance issues, while B indicates a failure but does not directly relate to performance metrics.