Databricks Certified Data Engineer Professional — Question 161
A data pipeline uses Structured Streaming to ingest data from Apache Kafka to Delta Lake. Data is being stored in a bronze table, and includes the Kafka-generated timestamp, key, and value. Three months after the pipeline is deployed, the data engineering team has noticed some latency issues during certain times of the day.
A senior data engineer updates the Delta Table's schema and ingestion logic to include the current timestamp (as recorded by Apache Spark) as well as the Kafka topic and partition. The team plans to use these additional metadata fields to diagnose the transient processing delays.
Which limitation will the team face while diagnosing this problem?
Answer options
- A. New fields will not be computed for historic records.
- B. Spark cannot capture the topic and partition fields from a Kafka source.
- C. Updating the table schema requires a default value provided for each field added.
- D. Updating the table schema will invalidate the Delta transaction log metadata.
Correct answer: A
Explanation
The correct answer is A because adding new fields to the Delta Table schema will not retroactively compute values for existing historical records, meaning the new metadata will only be available for new records moving forward. Option B is incorrect because Spark is capable of capturing Kafka topic and partition information. Option C is not correct as adding fields does not require default values for existing records. Lastly, option D is incorrect since updating the schema does not invalidate the Delta transaction log metadata.