Google Cloud Professional Data Engineer — Question 36
Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?
Answer options
- A. They have not assigned the timestamp, which causes the job to fail
- B. They have not set the triggers to accommodate the data coming in late, which causes the job to fail
- C. They have not applied a global windowing function, which causes the job to fail when the pipeline is created
- D. They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created
Correct answer: D
Explanation
The correct answer is D because using a non-global windowing function is necessary for processing streaming data where events may not arrive in a uniform order. Options A and B are incorrect because while assigning timestamps and accommodating late data are important, they do not directly cause the job to fail if non-global windowing is not applied. Option C is incorrect as the failure is related to the absence of a non-global windowing function, not a global one.