Big Data: Best Practices for Building Competency with Hadoop

By Dick Weisinger

Big Data platforms based on Hadoop and MapReduce technology are becoming increasingly popular. But how can this organizations most effectively use them?

James G. Kobielus, senior analyst at Forrester Research, recommends in the report “Enterprise Hadoop Best Practices: Concrete Guidelines From Early Adopters in Online Services” the following strategy when building an Hadoop framework of best practices in the enterprise:

Align Hadoop with your Big Data business priorities. Define your organization’s strategy for Big Data. Prioritize those projects and look to see how Hadoop could benefit them. Kobielus said that “Enterprises should have their big data priorities straight before deciding to address them with Hadoop, an EDW, some hybrid of the two, or another approach entirely… Start with a well-scoped Hadoop project with a clear impact and near-term payoff on your core big data imperative. ”

Integrate Hadoop with Enterprise software like Enterprise Data Warehouse (EDW). Kobielus said that “Hadoop has considerable promise in cloud EDW to support extremely scalable analytics, which is big data’s core application. ”

Build Hadoop implementations with enterprise-grade platforms. There are many available Hadoop products now — evaluate them carefully before making a decision on what is right for your organization. But Kobielus cautioned against standardizing on any one vendor or product — things are currently changing too quickly, and locking yourself into any one now could be a mistake.

Don’t build your Hadoop cluster any bigger than you need to. Kobielus commented that organizations should “explore a big data approach like Hadoop only if your data volumes are likely to scale into the high terabytes or even petabytes. If you overinvest in data storage, computing and networking capacity, you’ll add to your cost overhead without any concomitant business benefit”

Architect Hadoop projects in a way that will allow them later to be combined. Kobielus said that “Hadoop clusters implement a common stack of Hadoop subprojects, from the storage layer on up. This architectural approach facilitates subsequent convergence of silos as well as easy promotion of MapReduce and other jobs between the silos… Be sure to align your tactical Hadoop deployments so you can integrate and federate them as needed into a shared-service utility.”

Build up Hadoop expertise among staff members. Reach out to the community. Build expertise in MapReduce technology. Seek out guidance frmo consultants and cloud/SaaS providers like Amazon and Appistry.

December 29th, 2011

Category: Big Data, Technology

1 Comment

One comment on “Big Data: Best Practices for Building Competency with Hadoop”

radhika says:

November 4, 2019 at 1:10 am

Thank you for providing such nice piece of article. I’m glad to leave a comment. Expect more articles in future

Reply

Leave a Reply Cancel reply

Legal Terms & Disclaimers

This blog site is accessed from the website of Formtek, Inc. All visitors to or users of this blog site are subject to the terms and conditions and privacy policy that govern the Formtek website, links for which are provided above.

Some of the individuals posting to this blog site, including the moderators, work for Formtek. Postings by these individuals are the personal opinions of these individuals, not of Formtek. Their posted content is provided for informational purposes only and is not meant to be an endorsement or representation by Formtek or any other party. Postings to this blog site may be outdated, invalid or inaccurate by the time you read them. Individuals posting to this blog site make no statements, representations or warranties as to the timing, validity, accuracy or reliability of their postings.

This blog site may contain links to third party sites. Access to any third party site linked to this blog site is at your own risk. None of Formtek, the blog site moderator(s) and the individuals posting on this blog site that work for Formtek is responsible for the timing, validity, accuracy or reliability of any information, data, opinions, advice or statements made on these third party sites. These links are provided merely as a convenience and do not imply any endorsement.

Postings to this blog site are available to the public. You should not post, link to or otherwise upload any information considered confidential to this blog site. All postings to this blog site are moderated. Postings will appear if and when they are approved by the moderator. Notwithstanding any approval by the moderator, by posting information to this blog site, you agree to be solely responsible for the information you post, link to, or otherwise upload to the blog site. You agree to release Formtek from any liability related to that information or to your use of the blog site. You grant Formtek a worldwide, perpetual, irrevocable, royalty-free, fully-paid, and transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any information you post, link to or otherwise upload to this blog site.

Big Data: Best Practices for Building Competency with Hadoop

One comment on “Big Data: Best Practices for Building Competency with Hadoop”

Leave a Reply Cancel reply

Company

Products and Services

News

Resources

Legal Terms & Disclaimers