Access and Feeds

Unstructured Data: Using Analytics to Make Sense of Dark Data’s Secrets

By Dick Weisinger

Unstructured data is electronic information that isn’t rigidly divided into small bits of data and stored in data structures like the tables of a database or the elements of an XML document.  Because of that, unstructured data is difficult to thoroughly search and understand.  It is free flowing data that isn’t easily described by a template.

Unstructured data is often opaque to many data management and analysis tools — it’s hard to extract actionable meaning and information from unstructured data in an automated way.  The term ‘dark data’ specifically is used to describe the difficulty of deriving information from unstructured data.  Documents are created and stored away, but often, other than the creators of these documents, few know what kind of information are in them, and what risks and value they contain.

Harald Collet, global head of Bloomberg Vault, said that “this is a way to help the industry solve its challenge around unstructured data.  Eighty percent of [unstructured] data is generated by humans in the form of documents, emails, and recorded phone calls and is typically harder for employees to manage.”  Many organizations have great difficulty in organizing all of the information, which can range from office documents to contracts to shared information.  Collet noted when considering the large amount of data that organizations maintain that it “represents discovery risk.”  He said that “businesses today struggle to capture and organize the many different types of unstructured data in corporate file shares and enterprise repositories.  This ‘dark data’ presents firms with significant challenges and opportunities.”

To help solve the unstructured data problem, the Bloomberg vault created a product called File Analytics to address the management of unstructured data.  That solution adds a server that run on-premise and which scans, collects and indexes the metadata of files and documents.  That metadata is then pushed out to the ‘Bloomberg Vault’ running in the cloud where the data is analyzed.  In the cloud there are templates that are then used to identify data characteristics, and which then track, manage and analyze the data.  Retention and deletion policies are applied to the collected data.

The Bloomberg Vault is just one example of a product that vendors are creating for helping businesses get their unstructured data under control.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*