Databricks Certified Data Engineer Professional — Question 71
The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts transforms, and loads the data for their pipeline runs in 10 minutes.
Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?
Answer options
- A. Manually trigger a job anytime the business reporting team refreshes their dashboards
- B. Schedule a job to execute the pipeline once an hour on a new job cluster
- C. Schedule a Structured Streaming job with a trigger interval of 60 minutes
- D. Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster
- E. Configure a job that executes every time new data lands in a given directory
Correct answer: B
Explanation
Option B is correct because scheduling the pipeline to run once an hour on a new job cluster ensures it will complete within the hour, meeting the reporting team's requirement. Other options either require manual intervention, use more resources than necessary, or do not align with the hourly update requirement.