Hadoop 2.0 comes, bringing YARN

Alex Handy
October 16, 2013 —  (Page 1 of 2)
The Apache Foundation announced the release of Apache Hadoop 2.0. This next-generation version significantly expands the platform's capabilities, thanks to the new YARN cluster resource manager. Additionally, the Hadoop File System (HDFS) was upgraded to support high availability and data snapshotting.

Shaun Connolly, vice president of corporate strategy at Hortonworks, said that Hadoop 2.0 is the culmination of a great deal of work between his company and Apache. “If you look at the Hadoop 2.0 line, it's been in development for over two years. Our strategy has continued to be that we put a premium on the YARN work because each of these systems needs to plug in and inform YARN what the resources are so it can schedule the workloads appropriately,” he said.

(The very technical details about Hadoop 2.0: The Apache Software Foundation announces Apache Hadoop 2)

With Hadoop 2.0's new core resource management implemented in the YARN project, Hadoop clusters running version 2.0 will no longer be limited to Map/Reduce jobs, said Connolly. YARN will allow other types of jobs to be run across the cluster against the data inside HDFS. And because Hadoop 2.0 is binary compatible with existing Hadoop 1.x applications, data already stored inside of a Hadoop cluster can be left where it is while upgrading.

The HDFS was also upgraded in version 2.0. The primary change was to make the HDFS highly available. As a result of this increase in reliability and stability, HDFS can now be used to underpin real-time applications. The most common use case of this is when an HBase database inside of a Hadoop cluster is used as a back-end data store for an external-facing application. Prior to the high-availability changes in HDFS, HBase could not reliably host a database to the world.

Based on all these changes, a number of next-generation Hadoop projects have been in the works at Hortonworks and inside the Apache Incubator. One of these is Apache Tez, a framework for near-real-time data processing in Hadoop.

Related Search Term(s): Apache, Apache Incubator, Apache Tez, Big Data, Hadoop, Hadoop 2.0, Hadoop File System, HBase, HDFS, Shaun Connolly, YARN, YARN project,

Pages 1 2 

Share this link:

Doug Cutting: Why Hadoop is still No. 1
How Hadoop become the de facto standard, and what it plans on doing next Read More...

News on Monday  more>>
Android Developer News  more>>
SharePoint Tech Report  more>>
Big Data TechReport  more>>



Download Current Issue

Need Back Issues?

Want to subscribe?