Google Cloud Professional Data Engineer — Question 291
You are testing a Dataflow pipeline to ingest and transform text files. The files are compressed gzip, errors are written to a dead-letter queue, and you are using
SideInputs to join data. You noticed that the pipeline is taking longer to complete than expected; what should you do to expedite the Dataflow job?
Answer options
- A. Switch to compressed Avro files.
- B. Reduce the batch size.
- C. Retry records that throw an error.
- D. Use CoGroupByKey instead of the SideInput.
Correct answer: D
Explanation
Using CoGroupByKey instead of SideInput can improve the performance of the Dataflow job by allowing for more efficient data processing and joining. The other options may not directly address the root cause of the slowdown; for example, switching to Avro files or reducing batch size might not significantly impact processing time, while retrying failed records does not solve the underlying performance issues.