Databricks Certified Data Engineer Professional — Question 160

A data engineer is configuring a pipeline that will potentially see late-arriving, duplicate records.

In addition to de-duplicating records within the batch, which of the following approaches allows the data engineer to deduplicate data against previously processed records as it is inserted into a Delta table?

Answer options

Correct answer: C

Explanation

The correct answer is C because performing an insert-only merge with a matching condition on a unique key allows the data engineer to effectively insert new records while checking for and eliminating duplicates based on previously processed records. The other options do not provide a mechanism to compare incoming records against existing ones in a way that prevents duplicates during the insertion process.