Google Cloud Professional Data Engineer — Question 183
You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage. One of the pipeline transforms reads CSV files and emits an element for every CSV line. The job performance is low, the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?
Answer options
- A. Enable Vertical Autoscaling to let the pipeline use larger workers.
- B. Change the pipeline code, and introduce a Reshuffle step to prevent fusion.
- C. Update the job to increase the maximum number of workers.
- D. Use Dataflow Prime, and enable Right Fitting to increase the worker resources.
Correct answer: B
Explanation
The correct answer is B because introducing a Reshuffle step can help prevent fusion, which can lead to performance bottlenecks in Dataflow. Option A is incorrect as Vertical Autoscaling does not address the issue of low worker utilization. Option C would not solve the underlying problem of the autoscaler not triggering more workers. Option D is also not relevant since it does not address the fusion issue directly.