You are running a pipeline in Dataflow that receives messages from a Pub/Sub topic and wr…

Question

You are running a pipeline in Dataflow that receives messages from a Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

Accepted Answer

Correct answer: A, B. A. Increase the number of max workers — B. Use a larger instance type for your Dataflow workers — Increasing the number of max workers (option A) allows for more parallel processing of records, which can significantly improve throughput during peak loads. Using a larger instance type for your Dataflow workers (option B) provides more CPU resources for each worker, thus enhancing their capacity to handle higher workloads. Changing the zone (option C) is not effective as it does not directly relate to performance enhancement, and creating temporary tables in Bigtable or Cloud Spanner (options D and E) introduces additional complexity and latency without addressing the immediate CPU utilization issue.

Google Cloud Professional Data Engineer — Question 65

Answer options

Correct answer: A, B

Explanation