An hourly batch job is configured to ingest data files from a cloud object storage contai…

Question

An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is missed. The user_id field represents a unique key for the data, which has the following schema: user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT New records are all ingested into a table named account_history which maintains a full record of all data in the same schema as the source. The next table in the system is named account_current and is implemented as a Type 1 table representing the most recent value for each unique user_id. Which implementation can be used to efficiently update the described account_current table as part of each hourly batch job assuming there are millions of user accounts and tens of thousands of records processed hourly?

Accepted Answer

Correct answer: D. D. Filter records in account_history using the last_updated field and the most recent hour processed, as well as the max last_login by user_id write a merge statement to update or insert the most recent value for each user_id. — Option D is the best choice because it accurately filters the records by both last_updated and last_login, ensuring that the most recent and relevant data for each user_id is considered for the update. Option A is incorrect as it only deduplicates by username, which may not cater to unique user_ids. Option B is not suitable since it relies on Auto Loader and streaming, which may not align with the batch job requirement. Option C is inefficient since overwriting the entire account_current table may lead to performance issues and data loss if not managed carefully.

Databricks Certified Data Engineer Professional — Question 134

Answer options

Correct answer: D

Explanation