Microsoft has announced that Apache Hadoop 3.0 is now generally available on the Azure HDInsight analytics platform, along with a variety of improvements and enhancements aimed at providing more analytics data to users.
The Apache Hadoop 3.0 news was announced by Arindam Chatterjee, the principal group program manager for Azure HDInsight, in an April 15 post on the Microsoft Azure Blog.
“With the general availability of Apache Hadoop 3.0 on Azure HDInsight, we are building upon existing capabilities with a number of key enhancements that further improve performance and security and deepen support for the rich ecosystem of big data analytics applications,” wrote Chatterjee.
Azure HDInsight, a managed open-source analytics service for enterprises, works in conjunction with a variety of open-source frameworks, including Hadoop, Apache Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm and R. Azure HDInsight allows users to quickly process large stores of data for analysis.
Azure HDInsight services are available in 30 public regions and Azure Government Clouds in the United States and Germany, allowing users to perform a variety of analyses on mission-critical data in a wide range of business segments.
Several important new features in Apache Hadoop 3.0 will help users get the most out their Azure HDInsight analyses, wrote Chatterjee, including the inclusion of Apache Hive 3.0 which will allow developers to build traditional database applications on massive data lakes. “This is particularly important for enterprises who need to build GDPR/privacy compliant big data applications,” he wrote.
Also new to Apache Hadoop 3.0 is a Hive Warehouse Connector for Apache Spark, which allows developers to move data integrations from the metastore layer to the query engine layer, enabling higher and more reliable performance, wrote Chatterjee.
In addition, Apache HBase 2.0 and Apache Phoenix 5.0 have received a number of performance, stability and integration improvements, such as Phoenix 5.0 bringing more visibility into queries with query log by introducing a new system table that captures information about queries that are being run against the cluster.
Upgraded Security, Compliance Features
Enterprise-grade security and compliance features are also upgraded in the new version, which is a critical requirement for customers building big data applications that store or process sensitive data in the cloud, wrote Chatterjee.
Among the security improvements are Enterprise Security Package (ESP) support for Apache HBase, which allows users to authenticate to their HDInsight HBase clusters using their corporate domain credentials, while being subject to rich, fine-grained access policies, he continued. Also included in the latest Apache Hadoop is Bring Your Own Key (BYOK) support for Apache Kafka, which allows customers to now bring their own encryption keys into the Azure Key Vault and use them to encrypt the Azure Managed Disks storing their Apache Kafka messages.
“We look forward to seeing what innovations you will bring to your users and customers with Azure HDInsight,” wrote Chatterjee. “Read the developer guide and follow the quick start guide to learn more about implementing open source analytics pipelines on Azure HDInsight.”
Azure HDInsight supports an expanding application ecosystem with a variety of popular big data applications available on Azure Marketplace for tasks such as interactive analytics to application migration.