Google Cloud Associate Data Practitioner — Question 51

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address the problem. You need to ensure that the final output is generated as quickly as possible. What should you do?

Answer options

Correct answer: D

Explanation

The correct answer is D because using a directed acyclic graph (DAG) in Cloud Composer allows for efficient management of task dependencies and failure recovery, enabling rapid reruns of only the failed tasks. Options A, B, and C involve more cumbersome processes like user intervention or restarting the entire pipeline, which can delay the final output.