Access and Feeds

Machine Learning: Lack of Reproducibility Threatens Credibility

By Dick Weisinger

Reproducibility  and repeatability form the foundation of scientific research. Science works best when researchers have enough information and understanding of the parameters of research that has been done previously so they can reproduce those results and work to build on established and proven ideas.

Unfortunately, some areas like Machine Learning are moving so fast that these basics are often overlooked.  After all, new frameworks and tools for machine learning are being introduced monthly, if not daily.

Peter Warden, Machine Learning researcher, said that “ML frameworks trade off exact numeric determinism for performance, so if by a miracle somebody did manage to copy the steps exactly, there would still be tiny differences in the end results! In many real-world cases, the researcher won’t have made notes or remember exactly what she did, so even she won’t be able to reproduce the model. Even if she can, the frameworks the model code depend on can change over time, sometimes radically, so she’d need to also snapshot the whole system she was using to ensure that things work.”

Denny Britz, Deep Learning Researcher,  wrote that “in practice, as everyone re-implements techniques using different frameworks and pipelines, comparisons become meaningless. In almost every Deep Learning model implementation there exist a huge number ‘hidden variables’ that can affect results.”

Researchers need to take advice from David Donoho, Stanford Professor, who wrote “computational reproducibility is not an afterthought — it is something that must be designed into a project from the beginning.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*