Google Cloud Professional Data Engineer — Question 329

A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time has increased. You need to implement a change that would improve query performance in BigQuery. What should you do?

Answer options

Correct answer: B

Explanation

Implementing clustering on the package-tracking ID column (option B) helps optimize query performance by allowing BigQuery to retrieve only the necessary data related to specific tracking IDs, reducing scan times. In contrast, clustering on the ingest date (option A) may not effectively narrow down the data for geospatial analysis. Tiering older data to Cloud Storage (option C) may complicate access and reduce performance, while recreating the table with partitioning on the package delivery date (option D) does not focus on the immediate performance issue related to querying by tracking ID.