Access and Feeds

Data Prep: The 80% Drudge Factor Holding Back Analytics

By Dick Weisinger

GIGO — Garbage In, Garbage Out — simply means that data analysis and decision making based on poor raw data or information will necessarily result in a flawed analysis.

The need for accurate data is increasingly important as businesses adopt tools based on AI, business analytics, and Big Data. Data preparation is the process of cleaning data before being passing off for processing and analysis. Data prep involves transforming and reformatting the data, making corrections to it, and combining data to enrich the source data.

Stewart Bond, research director of the Data Integration and Integrity Software service at IDC, said that “It’s the complexity of data environments in this day and age. There’s multiple different data types: There’s transactional data, master data, social media data, structured data, unstructured data, log file data, graph data. There’s all different kinds of data that is out there and there’s all different kinds of technologies that these data are being stored in.”

Assigning metadata to data sets can also make data more useful and enable “data intelligence.” Bond said that “It’s the intelligence to know where the data is, what the data means, who’s using it, who can get access to it, why we have the data, how long we need to keep the data, and how people are using it.”

A report by CrowdFlower/Figure-Eight in 2016 found that data preparation accounts for about 80 percent of the time spent by analysts. Since then, this is a stat that has stuck and is frequently quoted, and despite new prep tools, data prep time continues to be a roadblock that has to be passed before real data analytics can be done.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*