Access and Feeds

Data Drift: Models Break when Data Context Changes

By Dick Weisinger

Data may seem like something that’s immutable and constant. But data is only relevant to the time and context in which it was collected. Time goes on. Contexts change.

This presents a problem for applications like big data and data analytics. Data collected last year or further in the past may no longer be as relevant to the current circumstances. Algorithms designed to predict based on data patterns from the past may miss the mark.

Girish Pancha, the CEO and co-founder of StreamSets, said that “in the old days, we had a simple data lifecycle. Data went from databases into data warehouses and applications, and was then fed into BI and reports and dashboards. That was the primary means of consumption. But, in the new world, consumption has exploded. It’s not just BI but search, big data applications. You have OLAP SQL stores, a whole bunch of SQL on Hadoop flavors, NoSQL, NewSQL…People think of this as a greater variety of sources. But from our perspective, we think of it as the data itself drifts.”

COVID-19, for example, broke data models for many marketers. The rules changed overnight. Purchasing habits changed overnight.

Brandon Purcell, principal analyst at Forrester, said that “data scientists like to talk about the concept of data drift, and typically that happens over time. That process just accelerated and now companies have to start collecting new data and creating new models based on the data from the point when folks started sheltering in place.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

11 + seventeen =