Databricks Certified Data Engineer Professional — Question 86
Which statement describes Delta Lake optimized writes?
Answer options
- A. Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
- B. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
- C. Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
- D. Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
- E. A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.
Correct answer: E
Explanation
The correct answer is E as it correctly describes the optimization process where data is shuffled to group similar records, thus minimizing the number of output files. Options A, B, C, and D do not accurately capture the essence of optimized writes in Delta Lake, as they focus on different mechanisms or concepts that do not pertain directly to the shuffling and file reduction process.