Google Cloud Associate Data Practitioner — Question 39
Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?
Answer options
- A. Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.
- B. Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.
- C. Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.
- D. Use the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.
Correct answer: C
Explanation
The correct answer is C because creating external tables allows you to query the Parquet files directly in Cloud Storage using SQL, which is efficient for one-time analyses. Option A requires the overhead of managing a Dataproc cluster, B involves more setup with Cloud Data Fusion, and D necessitates loading data into BigQuery, which is unnecessary for a one-time analysis.