You are designing a pipeline that publishes application events to a Pub/Sub topic. Althou…

Question

You are designing a pipeline that publishes application events to a Pub/Sub topic. Although message ordering is not important, you need to be able to aggregate events across disjoint hourly intervals before loading the results to BigQuery for analysis. What technology should you use to process and load this data to
BigQuery while ensuring that it will scale with large volumes of events?

Accepted Answer

Correct answer: D. D. Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows. — The correct answer is D because a streaming Dataflow job can continuously process events in real time and efficiently manage aggregation over tumbling windows, which is well-suited for handling large volumes of events. Options A and B involve Cloud Functions that are not designed for scalable data processing across large datasets, while option C, although a batch solution, does not provide the real-time processing capabilities required for large event scales.

Google Cloud Professional Data Engineer — Question 287

Answer options

Correct answer: D

Explanation