Databricks Certified Data Engineer Professional — Question 193
A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.
Which describes how Delta Lake can help to avoid data loss of this nature in the future?
Answer options
- A. The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.
- B. Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.
- C. Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.
- D. Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.
Correct answer: D
Explanation
The correct answer is D because ingesting all raw data into a bronze Delta table ensures that a complete record of the data is maintained, allowing for recovery and replay if necessary. Options A, B, and C do not directly address the prevention of data loss in the way that maintaining a permanent history does, as they focus more on tracking or validating data rather than preserving it.