Google Cloud Professional Data Engineer — Question 286
You are analyzing the price of a company's stock. Every 5 seconds, you need to compute a moving average of the past 30 seconds' worth of data. You are reading data from Pub/Sub and using DataFlow to conduct the analysis. How should you set up your windowed pipeline?
Answer options
- A. Use a fixed window with a duration of 5 seconds. Emit results by setting the following trigger: AfterProcessingTime.pastFirstElementInPane().plusDelayOf (Duration.standardSeconds(30))
- B. Use a fixed window with a duration of 30 seconds. Emit results by setting the following trigger: AfterWatermark.pastEndOfWindow().plusDelayOf (Duration.standardSeconds(5))
- C. Use a sliding window with a duration of 5 seconds. Emit results by setting the following trigger: AfterProcessingTime.pastFirstElementInPane().plusDelayOf (Duration.standardSeconds(30))
- D. Use a sliding window with a duration of 30 seconds and a period of 5 seconds. Emit results by setting the following trigger: AfterWatermark.pastEndOfWindow ()
Correct answer: D
Explanation
The correct answer is D because it allows for continuous calculation of the moving average every 5 seconds based on the most recent 30 seconds of data. Option A uses a fixed window which doesn't align with the requirement for a moving average. Option B also uses a fixed window and emits results too late for the moving average needed. Option C uses a sliding window but does not fit the 30-second data requirement effectively.