You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named…

Question

You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage1. New files are uploaded daily to storage1. You need to recommend a solution that configures storage1 as a structured streaming source. The solution must meet the following requirements: • Incrementally process new files as they are uploaded to storage1. • Minimize implementation and maintenance effort. • Minimize the cost of processing millions of files. • Support schema inference and schema drift. Which should you include in the recommendation?

Accepted Answer

Correct answer: C. C. Auto Loader — The correct answer is C, Auto Loader, as it is specifically designed for incrementally processing new files in a cost-effective manner while supporting schema inference and schema drift. Options A and D, while related, do not provide the same level of integration with Azure Databricks for streaming and incremental file processing. Option B, Azure Data Factory, is more focused on data orchestration and may involve higher maintenance and implementation efforts compared to Auto Loader.

Data Engineering on Microsoft Azure — Question 49

Answer options

Correct answer: C

Explanation