Google Cloud Professional Data Engineer — Question 250
You are designing a stateful data processing pipeline that reads data from a Cloud Storage bucket and writes transformed data to a BigQuery table. The pipeline must be highly available and resilient to zonal failures within the us-central1 region. You need to configure a Dataflow pipeline ensuring minimal disruption during a zonal outage. What should you do?
Answer options
- A. Launch the Dataflow job with the --region=us-central1 parameter.
- B. Deploy the Dataflow job to a single zone within us-central1 and configure it to use a regional persistent disk to store its state.
- C. Deploy the Dataflow job to a single zone within us-central1 and use a multi-regional Cloud Storage bucket to store its state.
- D. Launch the Dataflow job with the --zone=us-central1a parameter.
Correct answer: A
Explanation
The correct answer is A because launching the Dataflow job with the --region=us-central1 parameter allows the job to be distributed across multiple zones within that region, enhancing availability and resilience. Options B and C both limit the job to a single zone, making them susceptible to zonal failures, while option D confines the job to a specific zone, which does not provide the necessary redundancy.