Databricks Certified Data Engineer Professional — Question 63
In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both DEEP and SHALLOW CLONE, development tables are created using SHALLOW CLONE.
A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that VACUUM was run the day before.
Which statement describes why the cloned tables are no longer working?
Answer options
- A. Because Type 1 changes overwrite existing records, Delta Lake cannot guarantee data consistency for cloned tables.
- B. Running VACUUM automatically invalidates any shallow clones of a table; DEEP CLONE should always be used when a cloned table will be repeatedly queried.
- C. Tables created with SHALLOW CLONE are automatically deleted after their default retention threshold of 7 days.
- D. The metadata created by the CLONE operation is referencing data files that were purged as invalid by the VACUUM command.
- E. The data files compacted by VACUUM are not tracked by the cloned metadata; running REFRESH on the cloned table will pull in recent changes.
Correct answer: D
Explanation
The correct answer is D because when VACUUM is run on the source tables, it purges data files that the SHALLOW CLONE references. This invalidates the metadata of the cloned tables, causing them to stop functioning. The other options incorrectly attribute the issue to data consistency, automatic deletions, or refresh mechanisms that do not apply in this context.