Google Cloud Professional Data Engineer — Question 63

Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?

Answer options

Correct answer: A

Explanation

Assigning global unique identifiers (GUID) ensures that each data entry can be distinctly identified, making it the most efficient method for deduplication. The other options, while potentially effective, involve additional overhead in terms of computation and storage that could slow down the deduplication process.