Implementing an Azure Data Solution (legacy) — Question 4

Each day, company plans to store hundreds of files in Azure Blob Storage and Azure Data Lake Storage. The company uses the parquet format.
You must develop a pipeline that meets the following requirements:
✑ Process data every six hours
✑ Offer interactive data analysis capabilities
✑ Offer the ability to process data using solid-state drive (SSD) caching
✑ Use Directed Acyclic Graph(DAG) processing mechanisms
✑ Provide support for REST API calls to monitor processes
✑ Provide native support for Python
✑ Integrate with Microsoft Power BI
You need to select the appropriate data technology to implement the pipeline.
Which data technology should you implement?

Answer options

Correct answer: B

Explanation

The correct answer is B, HDInsight Apache Storm cluster, as it is designed for real-time data processing and meets the requirements for interactive analysis and SSD caching. Other options like Azure SQL Data Warehouse and HDInsight Apache Hadoop cluster using MapReduce do not support the required interactive capabilities and processing mechanisms effectively, while Azure Stream Analytics is more suited for event-driven scenarios rather than batch processing every six hours.