Databricks Certified Data Engineer Professional — Question 155
The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes.
Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?
Answer options
- A. Configure a job that executes every time new data lands in a given directory
- B. Schedule a job to execute the pipeline once an hour on a new job cluster
- C. Schedule a Structured Streaming job with a trigger interval of 60 minutes
- D. Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
Correct answer: B
Explanation
Option B is correct because scheduling a job to run on a new job cluster once an hour allows for the pipeline to complete in the 10-minute processing window while ensuring that the data is updated as required. Options A and C do not align with the hourly requirement, and option D incurs higher costs by using a dedicated interactive cluster instead of a more economical job cluster.