Data Engineering on Microsoft Azure — Question 10
You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.
You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.
What should you include in the solution?
Answer options
- A. Partition by DateTime fields.
- B. Sink to Azure Queue storage.
- C. Include a watermark column.
- D. Use a JSON format for physical data storage.
Correct answer: B
Explanation
The correct answer is B, as sinking to Azure Queue storage allows for efficient handling of large volumes of streaming events while optimizing costs. The other options, while useful in various contexts, do not directly address the requirement to minimize both storage costs and incremental load times as effectively as using Azure Queue storage.