Databricks Certified Data Engineer Professional — Question 149
Which statement describes Delta Lake optimized writes?
Answer options
- A. Before a Jobs cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.
- B. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 1 GB.
- C. A shuffle occurs prior to writing to try to group similar data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.
- D. Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
Correct answer: C
Explanation
Option C is correct because it describes the shuffling process that groups similar data to reduce the number of files written. Option A is incorrect as it refers to a job termination process that is not related to optimized writes. Option B incorrectly describes an asynchronous job that does not directly relate to the optimized write process. Option D mischaracterizes the use of logical partitions, which is not the primary focus of optimized writes.