Storage Station

A bird's eye view of the data storage industry.

RainStor Unveils New Connector for Hadoop to Archive Data

Although Hadoop is regularly used in analyzing archive data, there are few expressly designed connectors to archive volumes.

Download the authoritative guide:

Enterprise data store provider RainStor on June 24 announced the availability of a new archive application for Hadoop 2.0 that is included with its latest release, RainStor 6.

This is significant because although Hadoop is regularly used in analyzing archive data in enterprises, there are precious few expressly designed connectors for big data analytics from standard storage to archive volumes.

The new feature makes it easier for storage administrators to deploy an end-to-end solution on Hadoop for managing and analyzing high-value business data. Using RainStor's Archive App, users can conduct high-performance queries against secure multi-structured data in an efficient way, the company claims.

An archive is deployed when an organization has rapidly growing data that needs to be retained for ongoing business queries or when governance rules mandate that data be online and fully accessible for specific time frames. Business users often require analytic access to multiple years of history storing raw detailed data in order to derive business value and insights.

New capabilities include:

--Analytics performance speed-up: The new archive application features XQuery for hierarchical data and documents and extends analytics support to SQL 2003. Users benefit from a 10X to 100X-query boost speed using native SQL against a mix of structured, semi-structured data and documents in the same cluster. Performance improvements also apply to queries against Hive, Pig and MapReduce. An archive on Hadoop should achieve performance levels on par with the source environment, which is typically a data warehouse.

--RainStor Application Management on Hadoop 2.0: RainStor is open, standards-based and designed to run on a Hadoop Distributed File System (HDFS). Certified on Hortonworks 2.1 and Cloudera Enterprise 5, RainStor integrates with YARN to ensure full cooperation in managing resources across a busy Hadoop cluster. YARN is a sub-project of Hadoop at the Apache Software Foundation introduced in Hadoop 2.0 that separates the resource management and processing components. RainStor integrates with Apache Ambari for cluster monitoring, and with Hue for managing archive workflows. RainStor also provides connectivity through HCatalog, the de facto interface to relational data. These capabilities offer users increased flexibility in selecting the tools that best fit their needs.

--Governance for greater control: With this new Archive app, users gain enterprise-grade control of the data in Hadoop through lifecycle data management features for retention and expiry. Utilizing a rules-based workflow, users specify a record or groups of records to keep or delete, as they are loaded.

The RainStor Archive App on Hadoop 2.0 will become available in July. For further details on key capabilities and pricing, send an email here.

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor-in-Chief of eWEEK and responsible for all the publication's coverage. In his 15 years and more than 4,000 articles at eWEEK, he has distinguished himself in reporting...