Databricks Certified Data Engineer Professional — Question 166
A Delta table of weather records is partitioned by date and has the below schema:
date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT
To find all the records from within the Arctic Circle, you execute a query with the below filter:
latitude > 66.3
Which statement describes how the Delta engine identifies which files to load?
Answer options
- A. All records are cached to an operational database and then the filter is applied
- B. The Parquet file footers are scanned for min and max statistics for the latitude column
- C. The Hive metastore is scanned for min and max statistics for the latitude column
- D. The Delta log is scanned for min and max statistics for the latitude column
Correct answer: D
Explanation
The correct answer is D because the Delta engine uses the Delta log to track changes and statistics for the data files, allowing it to efficiently determine which files to load based on the specified filter. Options A and C are incorrect as they do not accurately describe how Delta Lake operates, and option B is also incorrect because the Delta engine relies on its own log rather than scanning Parquet file footers.