Designing an Azure Data Solution (legacy) — Question 33
You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:
✑ The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.
✑ Line total sales amount and line total tax amount will be aggregated in Databricks.
✑ Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.
You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.
What should you recommend?
Answer options
- A. Append
- B. Complete
- C. Update
Correct answer: A
Explanation
The 'Append' output mode is the best choice because it only adds new rows to the dataset, which aligns with the requirement that sales transactions are never updated and only new rows are added for adjustments. The 'Complete' mode would aggregate all data and could lead to duplicate entries, while 'Update' would modify existing rows, which is not applicable here since sales transactions are not updated.