The most popular and comprehensive Open Source ECM platform
Hadoop was originally developed primarily as a web crawler for batch indexing the content of web sites. Big Data analytics applications have adopted Hadoop, but because of its batch nature of the MapReduce algorithm, it isn’t good at interactively querying and analyzing real-time data streams.
Hadoop 2.0 attempts to improve and expand on the types of applications to which it can be applied. The new framework in Hadoop 2.0 is called YARN (acronym for “Yet Another Resource Negotiator” — or MapReduce 2.0). It’s more generic than MapReduce and is capable of handling live streams of data and interactive queries.
Arun Murthy, founder of Hortonworks, said that “the power of YARN to enable applications to run ‘in’ Hadoop, instead of ‘on’ Hadoop, is the key to leveraging all other common services of the next-generation data platform, from security to data lifecycle management… You can now have both the batch MapReduce jobs and interactive SQL queries running right next to each other in YARN.”
Shaun Connolly, Hortonworks vice president of corporate strategy for Hortonworks, said that “YARN creates a cluster that is aware of all the different types of workloads and resource needs, so they can all cohabitate. You don’t get one workload dominating or taking over all the resources of the cluster.”
YARN extracts the resource management capability that were previously was embedded in MapReduce and re-packages it in a way so that it can be used by new engines. With the new YARN framework, multiple applications can run simultaneously in Hadoop, all going through YARN as the common resource manager.
The new Hadoop 2.0 YARN framework enables:
- Greater scalability
- Improved cluster utilization
- Support for workloads other than MapReduce