The most popular and comprehensive Open Source ECM platform
Big Data: Businesses Wary of Analytics based on Unknown Social Media Data Quality
- The top four Big Data sources are transactions (88%), log data (73%), events (59%) and emails (57%)
- The top five Big Data capabilities are reporting (91%), data mining (77%), data visualization (71%), predictive modeling (67%) and optimization (65%)
- 6% of survey respondents are in the “Execute” phase of Big Data adoption, while 47% are still in the “Explore” phase
Garbage in — Garbage out. It’s a major problem with any data analysis. The mechanics of getting data collected and funneled into a system that’s capable of processing it is one issue. But once the data is captured, how accurate is it and can any analysis based on it truly be trusted? Data cleansing is a process that’s difficult to do well even with a data set that’s been generated internally and is well understood. It’s a much more difficult question to quantify data quality when dealing with data that originated from external sources.
Matin Jouzdani, Strategy Consultant at IBM Global Business Services, further elaborated, saying that “One key reason for companies not collecting and analyzing wider varieties of data lies in the veracity – or truthfulness – of insights generated from sources such as real-time data and social media. Striving for high data quality is an important Big Data requirement, and the survey respondents questioned the ability to trust rapidly growing forms of unstructured data, such as those generated from on-line consumer comments, reviews, Tweets and other forms of freely offered opinions. “