Google Cloud Professional Data Engineer — Question 295
You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Dataproc and Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?
Answer options
- A. cron
- B. Cloud Composer
- C. Cloud Scheduler
- D. Workflow Templates on Dataproc
Correct answer: B
Explanation
Cloud Composer is designed for orchestrating workflows in Google Cloud, making it ideal for managing complex data pipelines with dependencies, such as those involving Dataproc and Dataflow. While cron and Cloud Scheduler can schedule jobs, they lack the orchestration capabilities needed for managing dependencies. Workflow Templates on Dataproc are focused on simplifying Dataproc jobs, but they do not provide the full orchestration required for this scenario.