The most popular and comprehensive Open Source ECM platform
Data Poisoning: Machine Learning Susceptible to Manipulation by Hackers
Data poisoning is the alteration or manipulation of Machine Language (ML) training data. While the amount of data changed may be small, it may be enough to both avoid detection and to bias the algorithm into favoring a particular result.
Along with the popularity of Machine Learning has been the free availability of data sets and training results used for machine learning. While this can be a great resource and be very useful to organizations without large budgets to acquire data, there now becomes the worry, or at least, the question about whether or not the data is reliable and trustworthy.
Eugene Bagdasaryan, a doctoral candidate at Cornell Tech, said that “with many companies and programmers using models and codes from open-source sites on the internet, it is important to review and verify materials before integrating them into your current system. If hackers are able to implement code poisoning, they could manipulate models that automate supply chains and propaganda, as well as resume-screening and toxic comment deletion.”
Hyrum Anderson, principal architect for Trustworthy Machine Learning at Microsoft, told CSOOnline that “there’s this whole notion in academia right now that I think is really cool and not yet practical, but we’ll get there, that’s called machine unlearning. For GPT-3, the cost was $16 million or something to train the model once. If it were poisoned and identified after the fact, it could be really expensive to find the poisoned data and retrain. But if I could unlearn, if I could just say ‘Hey, for these data, undo their effects and my weights,’ that could be a significantly cheaper way to build a defense. I think practical solutions for machine unlearning are still years away, though. So yes, the solution at this point is to retrain with good data and that can be super hard to accomplish or expensive.”