Google Cloud Professional Data Engineer — Question 193
You are creating a data model in BigQuery that will hold retail transaction data. Your two largest tables, sales_transaction_header and sales_transaction_line, have a tightly coupled immutable relationship. These tables are rarely modified after load and are frequently joined when queried. You need to model the sales_transaction_header and sales_transaction_line tables to improve the performance of data analytics queries. What should you do?
Answer options
- A. Create a sales_transaction table that holds the sales_transaction_header information as rows and the sales_transaction_line rows as nested and repeated fields.
- B. Create a sales_transaction table that holds the sales_transaction_header and sales_transaction_line information as rows, duplicating the sales_transaction_header data for each line.
- C. Create a sales_transaction table that stores the sales_transaction_header and sales_transaction_line data as a JSON data type.
- D. Create separate sales_transaction_header and sales_transaction_line tables and, when querying, specify the sales_transaction_line first in the WHERE clause.
Correct answer: A
Explanation
The correct answer is A because nesting the sales_transaction_line rows within the sales_transaction_header as repeated fields optimizes query performance and takes advantage of BigQuery's strengths in handling nested structures. Option B increases data redundancy by duplicating the header information for each line, which is inefficient. Option C, while it can store the data, does not leverage the advantages of BigQuery's columnar storage for analytics. Option D does not address the structural optimization needed for performance enhancement.