Access and Feeds

Big Data and Nomadic Computing: Speeding up Analytics of Complex Data Sets

By Dick Weisinger

Researchers at The University of Texas at Austin and the Texas Advanced Computing Center (TACC) are developing ways to speed up analytics processing of massive data sets.  One of the techniques developed by the university team is called nomadic computing.

Nomadic computing has nothing to do with bedouins or gypsies.  Actually, the tool developed by the researchers is called NOMAD, which stands for “non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion.”  Data analysis using NOMAD is significantly faster than by using other techniques that are considered to be state-of-the-art.  The tool enables analysis on data sets so large that most other analytic tools are just not able to handle them.

The NOMAD technique involves simplifying the complexity of data sets.  The idea is that if data comes with many parameters or dimensions that it can be simplified by identifying just the subset of the dimensions that are the most meaningful.

Inderjit Dhillon, a professor of computer science at The University of Texas at Austin, said that “suppose you have a massive computational problem and you need to run it on datasets that do not fit in a computer’s memory.  If you want the answer in a reasonable amount of time, the logical thing to do would be to distribute the computations over different machines. We are trying to develop an asynchronous method where each parameter is, in a sense, a nomad.  The parameters go to different processors, but instead of synchronizing this computation followed by communication, the nomadic framework does its work whenever a variable is available at a particular processor.”

Amy Apon, a program director at NSF, said that “traditionally, machine learning inference algorithms run on a single large–and sometimes expensive–server, and this limits the size of the problem that can be addressed.  This team has noticed a property of some machine learning algorithms that if a few slowly changing variables can be only occasionally synchronized, then the work can be more easily distributed across different computers.  Their clever mathematical approach is opening doors to running machine algorithms on the kind of massive-scale, distributed, commodity computers that we find in today’s cloud computing environment.”

 

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*