The most popular and comprehensive Open Source ECM platform
Unstructured + Structured Data: The Challenge of Seeing the Two in Context
Analysts estimate that 80 percent of all new data created is unstructured data. It’s this crushing amount of data that is driving much of the interest behind Big Data tools like Hadoop. Extracting information from unstructured data is a big part of the challenge. But another challenge has to do with being able to see unstructured data in a bigger context and being able to draw relationships between the various elements of both unstructured and structured data.
Steve Andriole, said that “unstructured data is noisy. One of the major challenges of unstructured data analytics (UDA) is finding diagnostic signals within mountains of unstructured noise. Once it’s cleaned and analyzed, unstructured data must then be integrated with structured data. This can be done manually or with the major business intelligence (BI) platforms that companies already have in their analytics arsenals.”
David S. Linthicum said that “these days, unstructured data is not contained in the simple raw data storage systems from years ago, nor is it all binary data, such as videos or audio. The growth pattern is in unstructured data that is also complex data. This means that we’re dealing with massive amounts of data that’s missing metadata. Moreover, that data is typically related to other structured or unstructured data, but those relationships are not tracked within the data storage systems.”