Databricks Certified Data Engineer Professional — Question 197
A Delta Lake table representing metadata about content posts from users has the following schema:
user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE
Based on the above schema, which column is a good candidate for partitioning the Delta Table?
Answer options
- A. post_time
- B. date
- C. post_id
- D. user_id
Correct answer: B
Explanation
The 'date' column is the best choice for partitioning the Delta Table as it allows for efficient querying based on time frames, which is a common use case for post data. While 'post_time' is also related to time, it has a more granular level that may not optimize partitioning as effectively as 'date'. 'post_id' and 'user_id' do not provide a logical partitioning strategy for time-based queries.