Google Cloud Professional Data Engineer — Question 56
You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query `"-dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?
Answer options
- A. Create a separate table for each ID.
- B. Use the LIMIT keyword to reduce the number of rows returned.
- C. Recreate the table with a partitioning column and clustering column.
- D. Use the bq query --maximum_bytes_billed flag to restrict the number of bytes billed.
Correct answer: C
Explanation
The correct answer is C because creating a table with a partitioning column and clustering column allows BigQuery to scan only relevant partitions and clusters of data, significantly reducing the amount scanned. Options A and B do not effectively address the scanning issue, as creating separate tables for each ID would be inefficient, and using LIMIT would not reduce the scanned data for the query itself. Option D restricts billing but does not impact the data scanned during the query execution.