Google Cloud Professional Data Engineer — Question 253
You analyze user clickstream data to personalize content recommendations. The data arrives continuously and needs to be processed with low latency, including transformations such as sessionization (grouping clicks by user within a time window) and aggregation of user activity. You need to identify a scalable solution to handle millions of events each second and be resilient to late-arriving data. What should you do?
Answer options
- A. Use Firebase Realtime Database for ingestion and storage, and Cloud Run functions for processing and analytics.
- B. Use Cloud Data Fusion for ingestion and transformation, and Cloud SQL for storage and analytics.
- C. Use Pub/Sub for ingestion, Dataflow with Apache Beam for processing, and BigQuery for storage and analytics.
- D. Use Cloud Storage for ingestion, Dataproc with Apache Spark for batch processing, and BigQuery for storage and analytics.
Correct answer: C
Explanation
The correct answer is C because it utilizes Pub/Sub for real-time ingestion, Dataflow with Apache Beam for low-latency processing, and BigQuery for scalable storage and analytics, making it suitable for handling millions of events per second and accommodating late data. Options A and B do not provide the necessary low-latency processing, while option D is focused on batch processing and not optimal for continuous data streams.