Google Cloud Professional Data Engineer — Question 183

You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage. One of the pipeline transforms reads CSV files and emits an element for every CSV line. The job performance is low, the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?

Answer options

Correct answer: B

Explanation

The correct answer is B because introducing a Reshuffle step can help prevent fusion, which can lead to performance bottlenecks in Dataflow. Option A is incorrect as Vertical Autoscaling does not address the issue of low worker utilization. Option C would not solve the underlying problem of the autoscaler not triggering more workers. Option D is also not relevant since it does not address the fusion issue directly.