Google Cloud Professional Data Engineer — Question 191

You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?

Answer options

Correct answer: A

Explanation

The correct answer is A because inserting a Reshuffle operation helps to redistribute data across different workers, which can aid in identifying where the processing slowdown is happening. Options B and C focus on monitoring throughput and logging, which may not isolate the bottleneck effectively. Option D is unrelated to the processing speed, as permissions issues would likely cause failures rather than delays.