Implementing an Azure Data Solution (legacy) — Question 53
A company plans to store hundreds of files in an Azure Storage account and in Azure Data Lake Storage account. The files will be stored in the parquet format. A solution must be in place that would adopt the following requirements:
- Provide the ability to process the data every 5 hours
- Give the ability for interactive data analysis
- Give the ability to process data using solid-state drive caching
- Make use of Directed Acyclic Graph processing mechanisms
- Provide support for REST API calls for monitoring purposes
- Ensure support for Python and Integration with Microsoft Power BI
Which of the following would you consider for the solution?
Answer options
- A. Azure SQL Datawarehouse
- B. HDInsight Apache storm cluster
- C. Azure stream Analytics
- D. HDInsight Spark cluster
Correct answer: B
Explanation
The correct choice, HDInsight Apache storm cluster, is designed for real-time data processing and supports the requirements outlined, including the use of Directed Acyclic Graphs. The other options, such as Azure SQL Datawarehouse and Azure Stream Analytics, do not fully meet all specified needs, particularly in terms of interactive analysis and DAG processing capabilities.