Databricks Certified Data Engineer Professional — Question 42
A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:
SELECT COUNT (*) FROM table -
Which of the following describes how results are generated each time the dashboard is updated?
Answer options
- A. The total count of rows is calculated by scanning all data files
- B. The total count of rows will be returned from cached results unless REFRESH is run
- C. The total count of records is calculated from the Delta transaction logs
- D. The total count of records is calculated from the parquet file metadata
- E. The total count of records is calculated from the Hive metastore
Correct answer: C
Explanation
The correct answer is C because Delta Lake maintains transaction logs that track changes to the data, allowing for efficient counting of records. Option A is incorrect since scanning all data files would be inefficient and not necessary with Delta Lake. Option B is also wrong as it does not account for the Delta transaction logs which provide the most accurate count. Option D refers to parquet file metadata, which may not reflect the current state of the data accurately, and option E is incorrect because the Hive metastore does not provide real-time counts of records in Delta tables.