Microsoft Adds Apache Storm Analytics Processing for Hadoop on Azure

By on 23/05/2015

Microsoft announced that it will provide its Azure public cloud service new analytical capabilities and procession large flows of data. Microsoft added support of real-time analytics for Apache Hadoop in Azure HDInsight, Apache Storm and new machine learning capabilities in the Azure Marketplace.

Azure team describes the technology as a distributed, fault-tolerant, open-source computation system that allows you to process data in real time. Storm solutions can also provide guaranteed processing of data, with the ability to replay data that were not successfully processed the first time.

Microsoft said that big data, including Hadoop and advanced analytics, is changing the way users do analytics. As they collect and store more data than ever before, they expect more from their data and want more insights from it, including being able to do real-time analytics over streams of data to complement their existing Hadoop deployments. Microsoft’s approach is to make it easier for users to work with data of any type and size — using the tools, languages and frameworks they want — in a trusted cloud environment.

By providing analysis capabilities in real time HDInsight, Microsoft opens new business scenarios like the ability to analyze operational data in real time for predictive maintenance. The preview availability of Storm in HDInsight continues Microsoft’s investment in the Hadoop ecosystem and HDInsight. Recently, the company also announced support for HBase clusters in Azure cloud platform.

Microsoft also presents new opportunities for machine learning in Azure Marketplace, to enable customers and partners to take advantage of new functionality in the form of web services. One of the new services is the engine to add the product to the recommendations of the website. Other new machine learning services that are available on the Marketplace, include service anomaly detection for preventive maintenance or detection of fraud, as well as many packages of programming language R.

Hortonworks announced that the next version of the Hortonworks Data Platform (HDP) will include a hybrid data connection, which will allow customers to extend their Hadoop-deployment in Azure and use the cloud for backup, testing and scaling.

The addition of this feature comes in the wake of the announcement last year of Kinesis at Amazon, a fully managed service for real-time processing of collected continuous data on a massive scale (hundreds of terabytes of data per hour, from hundreds of thousands of source).

In late January, Amazon announced Kinesis Storm Spout, an offer that had thesupport of the Storm by Kinesis. In June, during its conference dedicated to developers, Google introduced the public Google Cloud Dataflow, a tool that aims to put the same footing as Amazon that allows the construction of data flow on the fly or batch mode, to control the execution, and to process and analyze data, all in the cloud. It is now the turn of Microsoft to compete with both Amazon and Google.