Databricks Certified Data Engineer Professional — Question 66
A data architect has heard about Delta Lake’s built-in versioning and time travel capabilities. For auditing purposes, they have a requirement to maintain a full record of all valid street addresses as they appear in the customers table.
The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.
Which piece of information is critical to this decision?
Answer options
- A. Data corruption can occur if a query fails in a partially completed state because Type 2 tables require setting multiple fields in a single update.
- B. Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.
- C. Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.
- D. Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.
- E. Delta Lake only supports Type 0 tables; once records are inserted to a Delta Lake table, they cannot be modified.
Correct answer: D
Explanation
The correct answer is D because Delta Lake time travel can incur high costs and latency when scaling for long-term versioning, making it less suitable for Type 1 tables. Option A discusses data corruption, which is not directly relevant to the scalability issue, while option B mentions shallow clones, which do not address the core scalability concern. Option C inaccurately states that time travel cannot be used with Type 1 tables, which is incorrect as it can be used but may not be cost-effective at scale. Option E is incorrect as Delta Lake does allow updates to tables.