The most popular and comprehensive Open Source ECM platform
Data Analysis: Without Data Quality Data Analysis is Useless
The results of your data analysis can only be as good as the quality of your data. That’s another way of rephrasing the well-known phrase: Garbage in, Garbage out.
Data is powering the rapid growth of technologies like Big Data, Data Analytics, and Machine Learning, but if the data isn’t correctly captured and prepared before being processed, the results will have little meaning.
A recent survey by Paxata found that most companies are struggling with how they collect and prepare the data they use for analysis. The survey found that only about 40 percent of $100+ million sized businesses have a mature process for collecting and preparing their data.
In preparing data, the breakdown of processing time is typically split into data ingestion (30 percent), data profiling (21 percent), and data remediation (21 percent).
What makes data prep difficult is that data can come from many sources, and the variety and complexity of the different data formats makes it hard to convert data into a consistent format that can be easily consumed by data-based applications. The Paxata survey found that, on average, 37 percent of an organization’s data comes from second and third-party sources. Much of the data that organizations would like to include in their analysis also comes from unstructured sources.