Databricks Certified Data Engineer Professional — Question 130

All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:

key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG

There are 5 unique topics being ingested. Only the "registration" topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.

Which solution meets the requirements?

Answer options

Correct answer: C

Explanation

Option C is correct because partitioning by the topic field allows for easier management of access controls and deletions based on whether the data contains PII or not. Options A and B do not effectively address the requirement of retaining non-PII records indefinitely, while Option D complicates the storage arrangement without addressing the retention needs.