The most popular and comprehensive Open Source ECM platform
Machine Learning: Lack of Reproducibility Threatens Credibility
Reproducibility and repeatability form the foundation of scientific research. Science works best when researchers have enough information and understanding of the parameters of research that has been done previously so they can reproduce those results and work to build on established and proven ideas.
Unfortunately, some areas like Machine Learning are moving so fast that these basics are often overlooked. After all, new frameworks and tools for machine learning are being introduced monthly, if not daily.
Peter Warden, Machine Learning researcher, said that “ML frameworks trade off exact numeric determinism for performance, so if by a miracle somebody did manage to copy the steps exactly, there would still be tiny differences in the end results! In many real-world cases, the researcher won’t have made notes or remember exactly what she did, so even she won’t be able to reproduce the model. Even if she can, the frameworks the model code depend on can change over time, sometimes radically, so she’d need to also snapshot the whole system she was using to ensure that things work.”
Denny Britz, Deep Learning Researcher, wrote that “in practice, as everyone re-implements techniques using different frameworks and pipelines, comparisons become meaningless. In almost every Deep Learning model implementation there exist a huge number ‘hidden variables’ that can affect results.”
Researchers need to take advice from David Donoho, Stanford Professor, who wrote “computational reproducibility is not an afterthought — it is something that must be designed into a project from the beginning.”