Google Cloud Professional Data Engineer — Question 84

Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period.
However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

Answer options

Correct answer: C

Explanation

The correct answer is C because using watermarks and timestamps allows the pipeline to manage late data effectively by defining the point in time for processing. Option A is incorrect as a single global window does not handle late data effectively. Option B is not suitable since sliding windows do not specifically address the issue of late data processing. Option D, while it emphasizes timestamps, does not incorporate the critical aspect of watermarks for managing lateness.