Big Data and Nomadic Computing: Speeding up Analytics of Complex Data Sets

By Dick Weisinger

Researchers at The University of Texas at Austin and the Texas Advanced Computing Center (TACC) are developing ways to speed up analytics processing of massive data sets. One of the techniques developed by the university team is called nomadic computing.

Nomadic computing has nothing to do with bedouins or gypsies. Actually, the tool developed by the researchers is called NOMAD, which stands for “non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion.” Data analysis using NOMAD is significantly faster than by using other techniques that are considered to be state-of-the-art. The tool enables analysis on data sets so large that most other analytic tools are just not able to handle them.

The NOMAD technique involves simplifying the complexity of data sets. The idea is that if data comes with many parameters or dimensions that it can be simplified by identifying just the subset of the dimensions that are the most meaningful.

Inderjit Dhillon, a professor of computer science at The University of Texas at Austin, said that “suppose you have a massive computational problem and you need to run it on datasets that do not fit in a computer’s memory. If you want the answer in a reasonable amount of time, the logical thing to do would be to distribute the computations over different machines. We are trying to develop an asynchronous method where each parameter is, in a sense, a nomad. The parameters go to different processors, but instead of synchronizing this computation followed by communication, the nomadic framework does its work whenever a variable is available at a particular processor.”

Amy Apon, a program director at NSF, said that “traditionally, machine learning inference algorithms run on a single large–and sometimes expensive–server, and this limits the size of the problem that can be addressed. This team has noticed a property of some machine learning algorithms that if a few slowly changing variables can be only occasionally synchronized, then the work can be more easily distributed across different computers. Their clever mathematical approach is opening doors to running machine algorithms on the kind of massive-scale, distributed, commodity computers that we find in today’s cloud computing environment.”

December 21st, 2015

Category: Big Data, Data Analytics

Leave a Reply Cancel reply

Legal Terms & Disclaimers

This blog site is accessed from the website of Formtek, Inc. All visitors to or users of this blog site are subject to the terms and conditions and privacy policy that govern the Formtek website, links for which are provided above.

Some of the individuals posting to this blog site, including the moderators, work for Formtek. Postings by these individuals are the personal opinions of these individuals, not of Formtek. Their posted content is provided for informational purposes only and is not meant to be an endorsement or representation by Formtek or any other party. Postings to this blog site may be outdated, invalid or inaccurate by the time you read them. Individuals posting to this blog site make no statements, representations or warranties as to the timing, validity, accuracy or reliability of their postings.

This blog site may contain links to third party sites. Access to any third party site linked to this blog site is at your own risk. None of Formtek, the blog site moderator(s) and the individuals posting on this blog site that work for Formtek is responsible for the timing, validity, accuracy or reliability of any information, data, opinions, advice or statements made on these third party sites. These links are provided merely as a convenience and do not imply any endorsement.

Postings to this blog site are available to the public. You should not post, link to or otherwise upload any information considered confidential to this blog site. All postings to this blog site are moderated. Postings will appear if and when they are approved by the moderator. Notwithstanding any approval by the moderator, by posting information to this blog site, you agree to be solely responsible for the information you post, link to, or otherwise upload to the blog site. You agree to release Formtek from any liability related to that information or to your use of the blog site. You grant Formtek a worldwide, perpetual, irrevocable, royalty-free, fully-paid, and transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any information you post, link to or otherwise upload to this blog site.

Big Data and Nomadic Computing: Speeding up Analytics of Complex Data Sets

Leave a Reply Cancel reply

Company

Products and Services

News

Resources

Legal Terms & Disclaimers