Google Cloud Associate Data Practitioner — Question 25
Your organization needs to implement near real-time analytics for thousands of events arriving each second in Pub/Sub. The incoming messages require transformations. You need to configure a pipeline that processes, transforms, and loads the data into BigQuery while minimizing development time. What should you do?
Answer options
- A. Use a Google-provided Dataflow template to process the Pub/Sub messages, perform transformations, and write the results to BigQuery.
- B. Create a Cloud Data Fusion instance and configure Pub/Sub as a source. Use Data Fusion to process the Pub/Sub messages, perform transformations, and write the results to BigQuery.
- C. Load the data from Pub/Sub into Cloud Storage using a Cloud Storage subscription. Create a Dataproc cluster, use PySpark to perform transformations in Cloud Storage, and write the results to BigQuery.
- D. Use Cloud Run functions to process the Pub/Sub messages, perform transformations, and write the results to BigQuery.
Correct answer: A
Explanation
The correct answer is A because using a Google-provided Dataflow template allows for quick setup and efficient processing of Pub/Sub messages with minimal development effort. Options B and C involve additional setup and maintenance, like configuring Cloud Data Fusion or a Dataproc cluster, which increases complexity and development time. Option D, while feasible, would require more manual coding and does not optimize for quick deployment as effectively as the Dataflow template.