Google Cloud Professional Data Engineer — Question 23

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

Answer options

Correct answer: D

Explanation

The correct answer is D because using the ROW_NUMBER window function allows you to assign a unique sequential integer to rows within a partition of data, which helps in identifying duplicates based on the unique ID. The other options either do not effectively filter out duplicates or do not utilize the unique identifiers correctly for the intended purpose.