Access and Feeds

Big Data: Businesses Wary of Analytics based on Unknown Social Media Data Quality

By Dick Weisinger

A report from IBM and Oxford’s Saïd Business School found that two-thirds of organizations think that the use of Big Data could be useful to their organization.  But, at this point, 70 percent of them are only in the very early stages of evaluation of the technology.
  • The top four Big Data sources are transactions (88%), log data (73%), events (59%) and emails (57%)
  • The top five Big Data capabilities are reporting (91%), data mining (77%), data visualization (71%), predictive modeling (67%) and optimization (65%)
  • 6% of survey respondents are in the “Execute” phase of Big Data adoption, while 47% are still in the “Explore” phase
The IBM report also digs into data analysis and the difficulty in assessing data veracity, especially for data which is collected externally.  The report comments that “Sentiment and truthfulness in humans; GPS sensors bouncing among the skyscrapers of Manhattan; weather conditions; economic factors; and the future. When dealing with these types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of Big Data.”
In particular, data collected from Social Media outlets is suspect by many organizations.  At this point, only 39 percent of those surveyed said that they collect any social data, and only 7 percent said that social-media derived data plays an important part of their data analysis.  While a top reason many businesses pursue  Big Data projects is to be able to improve their customer service/experience,  many are wary that data derived from social media is accurate.

Garbage in — Garbage out.  It’s a major problem with any data analysis.  The mechanics of getting data collected and funneled into a system that’s capable of processing it is one issue.  But once the data is captured, how accurate is it and can any analysis based on it truly be trusted?  Data cleansing is a process that’s difficult to do well even with a data set that’s been generated internally and is well understood.  It’s a much more difficult question to quantify data quality when dealing with data that originated from external sources.

Matin Jouzdani, Strategy Consultant at IBM Global Business Services, further elaborated, saying that “One key reason for companies not collecting and analyzing wider varieties of data lies in the veracity – or truthfulness – of insights generated from sources such as real-time data and social media.  Striving for high data quality is an important Big Data requirement, and the survey respondents questioned the ability to trust rapidly growing forms of unstructured data, such as those generated from on-line consumer comments, reviews, Tweets and other forms of freely offered opinions. “

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*