Google Cloud Associate Data Practitioner — Question 38
Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach. What should you do?
Answer options
- A. Use Cloud Scheduler to schedule the jobs to run.
- B. Use Cloud Tasks to schedule and run the jobs asynchronously.
- C. Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.
- D. Create directed acyclic graphs (DAGs) in Apache Airflow deployed on Google Kubernetes Engine. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.
Correct answer: C
Explanation
The correct answer is C because Cloud Composer is a fully managed workflow orchestration service based on Apache Airflow, designed specifically for scheduling and managing complex data pipelines with dependencies. Options A and B do not offer the capability to manage complex task dependencies in a fully managed manner, while option D, while similar, requires additional management overhead associated with Kubernetes.