Databricks Certified Data Engineer Associate — Question 82
What is used by Spark to record the offset range of the data being processed in each trigger in order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing?
Answer options
- A. Checkpointing and Write-ahead Logs
- B. Replayable Sources and Idempotent Sinks
- C. Write-ahead Logs and Idempotent Sinks
- D. Checkpointing and Idempotent Sinks
Correct answer: D
Explanation
The correct answer is D, as checkpointing allows Spark to save the state of the stream, while idempotent sinks ensure that data can be processed reliably without duplicates. Options A and C mention Write-ahead Logs, which are not used for tracking offsets in this context, and B includes Replayable Sources, which are not directly related to the tracking of processing progress.