Google Cloud Professional Data Engineer — Question 65
You are running a pipeline in Dataflow that receives messages from a Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)
Answer options
- A. Increase the number of max workers
- B. Use a larger instance type for your Dataflow workers
- C. Change the zone of your Dataflow pipeline to run in us-central1
- D. Create a temporary table in Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Bigtable to BigQuery
- E. Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
Correct answer: A, B
Explanation
Increasing the number of max workers (option A) allows for more parallel processing of records, which can significantly improve throughput during peak loads. Using a larger instance type for your Dataflow workers (option B) provides more CPU resources for each worker, thus enhancing their capacity to handle higher workloads. Changing the zone (option C) is not effective as it does not directly relate to performance enhancement, and creating temporary tables in Bigtable or Cloud Spanner (options D and E) introduces additional complexity and latency without addressing the immediate CPU utilization issue.