Databricks Certified Data Engineer Professional — Question 31
A Delta Lake table representing metadata about content posts from users has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE
This table is partitioned by the date column. A query is run with the following filter: longitude < 20 & longitude > -20
Which statement describes how data will be filtered?
Answer options
- A. Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.
- B. No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.
- C. The Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.
- D. Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.
- E. The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.
Correct answer: D
Explanation
The correct answer is D because the Delta Log maintains statistics that allow the engine to identify which data files may contain records that satisfy the filter condition on longitude. Option A is incorrect because it refers to partitions, while the filtering is based on file statistics. Option B is wrong as it misrepresents the relationship between partitions and filtering. Option C is incorrect because it mentions row-level statistics, which are not utilized in this filtering scenario.