Your team is responsible for developing and maintaining ETLs in your company. One of your…

Question

Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data).
What should you do?

Accepted Answer

Correct answer: D. D. Add a tryג€¦ catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to Pub/Sub later. — The correct answer is D because using a sideOutput allows you to capture and store erroneous rows for later processing without interrupting the main pipeline. Options A and B only address logging the errors without providing a mechanism for reprocessing, while option C sends erroneous data directly to Pub/Sub, which does not allow for later reprocessing within the main pipeline.

Google Cloud Professional Data Engineer — Question 303

Answer options

Correct answer: D

Explanation