Databricks Certified Data Engineer Professional — Question 190
A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30 seconds. The streaming write is currently configured with a trigger interval of 10 seconds.
Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?
Answer options
- A. Decrease the trigger interval to 5 seconds; triggering batches more frequently allows idle executors to begin processing the next batch while longer running tasks from previous batches finish.
- B. Decrease the trigger interval to 5 seconds; triggering batches more frequently may prevent records from backing up and large batches from causing spill.
- C. The trigger interval cannot be modified without modifying the checkpoint directory; to maintain the current stream state, increase the number of shuffle partitions to maximize parallelism.
- D. Use the trigger once option and configure a Databricks job to execute the query every 10 seconds; this ensures all backlogged records are processed with each batch.
Correct answer: B
Explanation
Option B is correct because decreasing the trigger interval to 5 seconds allows for more frequent processing, which can help prevent record backlogs and reduce the size of batches, thereby addressing delays. Option A, while similar, doesn't emphasize the prevention of backups as effectively as B. Options C and D are not viable solutions since they either misrepresent the capabilities of modifying the trigger interval or do not solve the core issue of processing delays.