Google Cloud Professional Data Engineer — Question 56

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query `"-dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

Answer options

Correct answer: C

Explanation

The correct answer is C because creating a table with a partitioning column and clustering column allows BigQuery to scan only relevant partitions and clusters of data, significantly reducing the amount scanned. Options A and B do not effectively address the scanning issue, as creating separate tables for each ID would be inefficient, and using LIMIT would not reduce the scanned data for the query itself. Option D restricts billing but does not impact the data scanned during the query execution.