Databricks Certified Data Engineer Associate — Question 77
A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.
Which action can the data engineer perform to improve the start up time for the clusters used for the Job?
Answer options
- A. They can use endpoints available in Databricks SQL
- B. They can use jobs clusters instead of all-purpose clusters
- C. They can configure the clusters to autoscale for larger data sizes
- D. They can use clusters that are from a cluster pool
Correct answer: D
Explanation
Using clusters from a cluster pool allows for quicker access to pre-existing clusters, significantly reducing startup time. The other options do not address the startup delay directly; jobs clusters might help with resource allocation but still require starting up, while autoscaling configurations and SQL endpoints do not impact the cluster initialization speed.