The most popular and comprehensive Open Source ECM platform
Unstructured data refers to documents and files made up of freeform information. Information from unstructured data is difficult to retrieve because it is typically dense in text and has no pre-defined data model like a database schema, XML or JSON structure. The documents often contain useful numbers, dates, facts and analysis. That’s unfortunate because unstructured data is often overlooked when using tools like data analytics.
A recent report looked at which data sources business analysts typically use when applying data analytics. Analytics go-to source of information is typically internal data (70 percent), business systems data (59 percent) and structured data (58 percent). Despite the fact that many tools like Hadoop were designed with a focus on unstructured data, extracting useful information from unstructured data typically still takes significant time, expertise and often special techniques. As a result, analysts use unstructured data sources on only 37 percent of projects, according to the report by Clutch.
The Clutch report found that even with the analysis of newer technologies like Internet of Things (IoT), social networks and external data, most analytics is done using internal, business systems and structured data.
Leif Hanlen, a business development executive at Data61, said that “unstructured data sitting inside the enterprise—in the customer relationship-management system, in fields called ‘other’—is like a hole in the ground that’s yet to become a goldmine. The task for analytics of unstructured data is not to build a brand new goldmine, but to extract elements of information from that unstructured data.”