Google Cloud Professional Data Engineer — Question 53
You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?
Answer options
- A. Batch job, PubSubIO, side-inputs
- B. Streaming job, PubSubIO, JdbcIO, side-outputs
- C. Streaming job, PubSubIO, BigQueryIO, side-inputs
- D. Streaming job, PubSubIO, BigQueryIO, side-outputs
Correct answer: C
Explanation
The correct answer is C because a streaming job is appropriate for processing real-time data from Cloud Pub/Sub while using side-inputs allows the pipeline to incorporate the static reference data from BigQuery. Options A and B are incorrect as they either suggest a batch job or use JdbcIO, which is not relevant in this context. Option D is also incorrect because it mentions side-outputs, which are not necessary for this task.