Databricks Certified Data Engineer Professional — Question 37

A data engineer is configuring a pipeline that will potentially see late-arriving, duplicate records.

In addition to de-duplicating records within the batch, which of the following approaches allows the data engineer to deduplicate data against previously processed records as it is inserted into a Delta table?

Answer options

Correct answer: C

Explanation

The correct answer is C, as performing an insert-only merge with a matching condition on a unique key allows for the comparison of new records against existing data, effectively eliminating duplicates. Option A does not exist in Delta Lake's functionality, while B, D, and E do not directly address the need for deduplication against previously processed records during the insertion process.