Google Cloud Professional Data Engineer — Question 121

You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?

Answer options

Correct answer: B

Explanation

The correct answer is B because using a ParDo transform allows you to apply a function that can filter out corrupt elements effectively. Options A and D do not directly remove corrupt data, while option C would separate the data but not eliminate the corrupt elements from the output.