A table in the Lakehouse named customer_churn_params is used in churn prediction by the m…

Question

A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.
The churn prediction model used by the ML team is fairly stable in production. The team is only interested in making predictions on records that have changed in the past 24 hours.
Which approach would simplify the identification of these changed records?

Accepted Answer

Correct answer: E. E. Replace the current overwrite logic with a merge statement to modify only those records that have changed; write logic to make predictions on the changed records identified by the change data feed. — The correct answer is E because using a merge statement allows for efficient updates to only the records that have changed, streamlining the process for identifying records needing predictions. Option A processes all records, which is unnecessary, while B and C do not specifically target only the changed records. Option D may help with tracking but does not address the need for targeted predictions on changed records.

Databricks Certified Data Engineer Professional — Question 23

Answer options

Correct answer: E

Explanation