Google Cloud Professional Data Engineer — Question 310
An aerospace company uses a proprietary data format to store its flight data. You need to connect this new data source to BigQuery and stream the data into
BigQuery. You want to efficiently import the data into BigQuery while consuming as few resources as possible. What should you do?
Answer options
- A. Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source.
- B. Use a standard Dataflow pipeline to store the raw data in BigQuery, and then transform the format later when the data is used.
- C. Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format.
- D. Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format.
Correct answer: D
Explanation
The correct answer is D because using an Apache Beam custom connector allows for efficient streaming of data into BigQuery in Avro format, which is optimized for BigQuery's ingestion. Other options either involve unnecessary batch processing (A, B) or use a less efficient format like CSV (C), which may not leverage the capabilities of BigQuery as effectively.