A Delta Lake table representing metadata about content posts from users has the following…

Question

A Delta Lake table representing metadata about content posts from users has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE
This table is partitioned by the date column. A query is run with the following filter: longitude < 20 & longitude > -20
Which statement describes how data will be filtered?

Accepted Answer

Correct answer: D. D. Statistics in the Delta Log will be used to identify data files that might include records in the filtered range. — The correct answer is D because the Delta Log maintains statistics that allow the engine to identify which data files may contain records that satisfy the filter condition on longitude. Option A is incorrect because it refers to partitions, while the filtering is based on file statistics. Option B is wrong as it misrepresents the relationship between partitions and filtering. Option C is incorrect because it mentions row-level statistics, which are not utilized in this filtering scenario.

Databricks Certified Data Engineer Professional — Question 31

Answer options

Correct answer: D

Explanation