Google Cloud Professional Data Engineer — Question 246
You need to load a dataset with multiple terabytes of clickstream data into BigQuery. The data arrives each day as compressed JSON files in a Cloud Storage bucket. You need a low-cost, programmatic, and scalable solution to load the data into BigQuery. What should you do?
Answer options
- A. Create an external table in BigQuery pointing to the Cloud Storage bucket and run the INSERT INTO ... SELECT * FROM external_table command.
- B. Use the BigQuery Data Transfer Service from Cloud Storage.
- C. Create a Cloud Run function to run a Python script to read and parse each JSON file, and use the BigQuery streaming insert API.
- D. Use Cloud Data Fusion to create a pipeline to load the JSON files into BigQuery.
Correct answer: A
Explanation
The correct answer is A because creating an external table in BigQuery allows for efficient querying of large datasets directly from Cloud Storage without incurring additional data transfer costs. Options B, C, and D involve higher costs or complexity, as they either require additional services or processing steps that are unnecessary for simply loading data into BigQuery.