The most popular and comprehensive Open Source ECM platform
Garbage in — Garbage out. Machine Learning (ML) and Artificial Intelligence (AI) algorithms often work by looking for patterns that occur across huge volumes of data, but dirty or poor data sets can throw a ringer into AI projects.
Nathaniel Gates, CEO of Alegion, said that“the single largest obstacle to implementing machine learning models into production is the volume and quality of the training data. This research reinforces our own experience, that data science teams new to building ROI-driven systems try to tackle training data preparation in house, and get overwhelmed.
That sentiment is echoed in a Dimensional Research report that surveyed existing AI and ML projects, finding that 80 percent of them have stalled and 96 percent saying that their problems and challenges are typically related to the ability to get and label quality data.
Another report by Cognilytica found that 80 percent of AI project time was spent in prepping the data. The report notes that surprisingly data-prep requires a lot of human intervention.
As AI increasingly becomes important, so too will the pressure for creating tools to effectively clean the data that is being processed. Markets and Markets estimates that the Data prep market will grow from $1.46 billion in 2016 to $3.93 billion by 2021. That’s a 25.2 percent annual growth rate.