Databricks Certified Data Engineer Professional — Question 38
Which of the following is true of Delta Lake and the Lakehouse?
Answer options
- A. Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.
- B. Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.
- C. Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.
- D. Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.
- E. Z-order can only be applied to numeric values stored in Delta Lake tables.
Correct answer: B
Explanation
Option B is correct because Delta Lake does collect statistics on the first 32 columns of tables, which helps optimize query performance through data skipping. Option A is incorrect as Parquet compresses data based on patterns, not just repetition. Option C is wrong because views may not always maintain an up-to-date cache of source tables. Option D is misleading since primary and foreign key constraints do not inherently prevent duplicates in all scenarios. Option E is false because Z-order can be applied to non-numeric types as well.