A shipping company has live package-tracking data that is sent to an Apache Kafka stream…

Question

A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time has increased. You need to implement a change that would improve query performance in BigQuery. What should you do?

Accepted Answer

Correct answer: B. B. Implement clustering in BigQuery on the package-tracking ID column. — Implementing clustering on the package-tracking ID column (option B) helps optimize query performance by allowing BigQuery to retrieve only the necessary data related to specific tracking IDs, reducing scan times. In contrast, clustering on the ingest date (option A) may not effectively narrow down the data for geospatial analysis. Tiering older data to Cloud Storage (option C) may complicate access and reduce performance, while recreating the table with partitioning on the package delivery date (option D) does not focus on the immediate performance issue related to querying by tracking ID.

Google Cloud Professional Data Engineer — Question 329

Answer options

Correct answer: B

Explanation