Google Cloud Professional Data Engineer — Question 201

You are running a streaming pipeline with Dataflow and are using hopping windows to group the data as the data arrives. You noticed that some data is arriving late but is not being marked as late data, which is resulting in inaccurate aggregations downstream. You need to find a solution that allows you to capture the late data in the appropriate window. What should you do?

Answer options

Correct answer: A

Explanation

Using watermarks allows the system to identify late data more effectively and incorporate it into the appropriate window, which is essential for accurate aggregations. Changing to tumbling or session windows may not accommodate the late data correctly, and simply expanding the hopping window could lead to inefficiencies without solving the root issue of late data recognition.