A Structured Streaming job deployed to production has been resulting in higher than expec…

Question

A Structured Streaming job deployed to production has been resulting in higher than expected cloud storage costs. At present, during normal execution, each microbatch of data is processed in less than 3s; at least 12 times per minute, a microbatch is processed that contains 0 records. The streaming write was configured using the default trigger settings. The production job is currently scheduled alongside many other Databricks jobs in a workspace with instance pools provisioned to reduce start-up time for jobs with batch execution. Holding all other variables constant and assuming records need to be processed in less than 10 minutes, which adjustment will meet the requirement?

Accepted Answer

Correct answer: E. E. Use the trigger once option and configure a Databricks job to execute the query every 10 minutes; this approach minimizes costs for both compute and storage. — The correct answer is E because using the trigger once option allows the job to run periodically without incurring costs for processing empty microbatches. The other options either do not address the issue of zero record batches or suggest configurations that could lead to increased costs due to more frequent querying or processing.

Databricks Certified Data Engineer Professional — Question 72

Answer options

Correct answer: E

Explanation