Google Cloud Professional Data Engineer — Question 198
You are designing a real-time system for a ride hailing app that identifies areas with high demand for rides to effectively reroute available drivers to meet the demand. The system ingests data from multiple sources to Pub/Sub, processes the data, and stores the results for visualization and analysis in real-time dashboards. The data sources include driver location updates every 5 seconds and app-based booking events from riders. The data processing involves real-time aggregation of supply and demand data for the last 30 seconds, every 2 seconds, and storing the results in a low-latency system for visualization. What should you do?
Answer options
- A. Group the data by using a tumbling window in a Dataflow pipeline, and write the aggregated data to Memorystore.
- B. Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore.
- C. Group the data by using a session window in a Dataflow pipeline, and write the aggregated data to BigQuery.
- D. Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to BigQuery.
Correct answer: B
Explanation
The correct option is B because a hopping window is ideal for this scenario where you need to aggregate data over overlapping intervals, allowing for real-time updates every 2 seconds. Option A is incorrect because a tumbling window does not support overlapping, which is essential for timely updates. Options C and D are incorrect as they suggest using a session window and writing to BigQuery, which does not meet the low-latency requirement specified in the scenario.