Access and Feeds

Unstructured + Structured Data: The Challenge of Seeing the Two in Context

By Formtek

Analysts estimate that 80 percent of all new data created is unstructured data.  It’s this crushing amount of data that is driving much of the interest behind Big Data tools like Hadoop.  Extracting information from unstructured data is a big part of the challenge.  But another challenge has to do with being able to see unstructured data in a bigger context and being able to draw relationships between the various elements of both unstructured and structured data.

Steve Andriole, said  that “unstructured data is noisy.  One of the major challenges of unstructured data analytics (UDA) is finding diagnostic signals within mountains of unstructured noise. Once it’s cleaned and analyzed, unstructured data must then be integrated with structured data. This can be done manually or with the major business intelligence (BI) platforms that companies already have in their analytics arsenals.”

David S. Linthicum said that “these days, unstructured data is not contained in the simple raw data storage systems from years ago, nor is it all binary data, such as videos or audio. The growth pattern is in unstructured data that is also complex data. This means that we’re dealing with massive amounts of data that’s missing metadata. Moreover, that data is typically related to other structured or unstructured data, but those relationships are not tracked within the data storage systems.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*