The most popular and comprehensive Open Source ECM platform
The convergence of massive amounts of cheap storage, fast processors, and software algorithms like MapReduce are fueling the growth of Big Data technoogy. MapReduce is a computing framework that can distribute the processing of a problem described by very large datasets across many computers. Hadoop is an implementation of the MapReduce Algorithm that combines components of the Google File System (GFS) Hadoop’s and makes the software available via the free Apache license.
Hadoop has spawned a a major part of the Big Data industry. A recent CRN article reviews 11 new up-and-coming startups that have built their business plans around Hadoop. Hadoop has primarily been viewed as a tool to help in the computing and crunching of massive amounts of data to derive information and analytics.
Frank Ohlhorst, Senior Technology Editor at Ziff Davis Enterprise, points out that while Hadoop is an enabler of massive data crunching projects, it also can be viewed as an unstructured data storage platform. Ohlhost wrote that ”Hadoop solves the most common problem associated with big data: efficiently storing and accessing large amounts of data. The intrinsic design of Hadoop allows it to run as a platform that is able to work across a large number of machines that don’t share any memory or disks. With that in mind, it becomes easy to see how Hadoop offers additional value — network managers can simply buy a number of commodity servers, place them in a rack, and run the Hadoop software on each one.”
Hadoop brings new capabilities that standard centralized storage platforms can’t offer, like “fault-tolerant clustered architecture and the capability to move computing power closer to the data and perform parallel and/or batch processing of large data sets. It also provides an open ecosystem that supports enterprise architecture layers from data storage to analytics processes.”