Google Cloud Professional Data Engineer — Question 253

You analyze user clickstream data to personalize content recommendations. The data arrives continuously and needs to be processed with low latency, including transformations such as sessionization (grouping clicks by user within a time window) and aggregation of user activity. You need to identify a scalable solution to handle millions of events each second and be resilient to late-arriving data. What should you do?

Answer options

Correct answer: C

Explanation

The correct answer is C because it utilizes Pub/Sub for real-time ingestion, Dataflow with Apache Beam for low-latency processing, and BigQuery for scalable storage and analytics, making it suitable for handling millions of events per second and accommodating late data. Options A and B do not provide the necessary low-latency processing, while option D is focused on batch processing and not optimal for continuous data streams.