Google Cloud Professional Data Engineer — Question 51
You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?
Answer options
- A. Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
- B. Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
- C. Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
- D. Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
Correct answer: C
Explanation
The correct answer is C because using Cloud Storage with permanent tables in BigQuery allows for efficient querying of large datasets while minimizing costs. Options A and B involve Cloud Bigtable, which is not cost-effective for this scenario, and option D suggests temporary tables, which are less suitable for repeated queries by multiple users.