You are testing a Dataflow pipeline to ingest and transform text files. The files are com…

Question

You are testing a Dataflow pipeline to ingest and transform text files. The files are compressed gzip, errors are written to a dead-letter queue, and you are using
SideInputs to join data. You noticed that the pipeline is taking longer to complete than expected; what should you do to expedite the Dataflow job?

Accepted Answer

Correct answer: D. D. Use CoGroupByKey instead of the SideInput. — Using CoGroupByKey instead of SideInput can improve the performance of the Dataflow job by allowing for more efficient data processing and joining. The other options may not directly address the root cause of the slowdown; for example, switching to Avro files or reducing batch size might not significantly impact processing time, while retrying failed records does not solve the underlying performance issues.

Google Cloud Professional Data Engineer — Question 291

Answer options

Correct answer: D

Explanation