Google Cloud Associate Data Practitioner — Question 3
Your company is building a near real-time streaming pipeline to process JSON telemetry data from small appliances. You need to process messages arriving at a Pub/Sub topic, capitalize letters in the serial number field, and write results to BigQuery. You want to use a managed service and write a minimal amount of code for underlying transformations. What should you do?
Answer options
- A. Use a Pub/Sub to BigQuery subscription, write results directly to BigQuery, and schedule a transformation query to run every five minutes.
- B. Use a Pub/Sub to Cloud Storage subscription, write a Cloud Run service that is triggered when objects arrive in the bucket, performs the transformations, and writes the results to BigQuery.
- C. Use the “Pub/Sub to BigQuery” Dataflow template with a UDF, and write the results to BigQuery.
- D. Use a Pub/Sub push subscription, write a Cloud Run service that accepts the messages, performs the transformations, and writes the results to BigQuery.
Correct answer: C
Explanation
The correct answer is C because using the 'Pub/Sub to BigQuery' Dataflow template with a UDF allows for efficient processing of messages with minimal code for transformations. Option A requires a scheduled query, which doesn't handle real-time processing efficiently. Option B involves more complexity by using Cloud Storage and a Cloud Run service, which is not as streamlined. Option D also adds unnecessary complexity with a push subscription and a Cloud Run service.