Your team is building several data pipelines that contain a collection of complex tasks a…

Question

Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach. What should you do?

Accepted Answer

Correct answer: C. C. Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery. — The correct answer is C because Cloud Composer is a fully managed workflow orchestration service based on Apache Airflow, designed specifically for scheduling and managing complex data pipelines with dependencies. Options A and B do not offer the capability to manage complex task dependencies in a fully managed manner, while option D, while similar, requires additional management overhead associated with Kubernetes.

Google Cloud Associate Data Practitioner — Question 38

Answer options

Correct answer: C

Explanation