Databricks Certified Data Engineer Professional — Question 7
A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT
To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3
Which statement describes how the Delta engine identifies which files to load?
Answer options
- A. All records are cached to an operational database and then the filter is applied
- B. The Parquet file footers are scanned for min and max statistics for the latitude column
- C. All records are cached to attached storage and then the filter is applied
- D. The Delta log is scanned for min and max statistics for the latitude column
- E. The Hive metastore is scanned for min and max statistics for the latitude column
Correct answer: D
Explanation
The correct answer is D because the Delta engine utilizes the Delta log to access min and max statistics for the latitude column, which helps in determining the relevant files to load. Options A and C incorrectly suggest that caching occurs before filtering, which is not how Delta handles queries. Option B incorrectly refers to scanning Parquet file footers instead of the Delta log, and option E incorrectly suggests the use of the Hive metastore.