You are designing a real-time system for a ride hailing app that identifies areas with hi…

Question

You are designing a real-time system for a ride hailing app that identifies areas with high demand for rides to effectively reroute available drivers to meet the demand. The system ingests data from multiple sources to Pub/Sub, processes the data, and stores the results for visualization and analysis in real-time dashboards. The data sources include driver location updates every 5 seconds and app-based booking events from riders. The data processing involves real-time aggregation of supply and demand data for the last 30 seconds, every 2 seconds, and storing the results in a low-latency system for visualization. What should you do?

Accepted Answer

Correct answer: B. B. Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore. — The correct option is B because a hopping window is ideal for this scenario where you need to aggregate data over overlapping intervals, allowing for real-time updates every 2 seconds. Option A is incorrect because a tumbling window does not support overlapping, which is essential for timely updates. Options C and D are incorrect as they suggest using a session window and writing to BigQuery, which does not meet the low-latency requirement specified in the scenario.

Google Cloud Professional Data Engineer — Question 198

Answer options

Correct answer: B

Explanation