Google Cloud Associate Data Practitioner — Question 40
You are developing a data ingestion pipeline to load small CSV files into BigQuery from Cloud Storage. You want to load these files upon arrival to minimize data latency. You want to accomplish this with minimal cost and maintenance. What should you do?
Answer options
- A. Use the bq command-line tool within a Cloud Shell instance to load the data into BigQuery.
- B. Create a Cloud Composer pipeline to load new files from Cloud Storage to BigQuery and schedule it to run every 10 minutes.
- C. Create a Cloud Run function to load the data into BigQuery that is triggered when data arrives in Cloud Storage.
- D. Create a Dataproc cluster to pull CSV files from Cloud Storage, process them using Spark, and write the results to BigQuery.
Correct answer: C
Explanation
The correct answer is C, as creating a Cloud Run function allows for immediate processing of data as soon as it arrives in Cloud Storage, thereby minimizing latency and maintenance. Option A lacks automation for real-time loading, option B introduces unnecessary scheduling which increases latency, and option D is more complex and costly due to the use of a Dataproc cluster for a task that can be handled more efficiently.