Tech News

Furious AI researcher creates a list of non-reproducible machine learning papers

On February 14, a researcher who was frustrated with reproducing the results of a machine learning research paper opened up a Reddit account under the username ContributionSecure14 and posted the r/MachineLearning subreddit: “I just spent a week implementing a paper as a baseline and failed to reproduce the results. I realized today after googling for a bit that a few others were also unable to reproduce the results. Is there a list of such papers? It will save people a lot of time and effort.”

The post struck a nerve with other users on r/MachineLearning, which is the largest Reddit community for machine learning.

“Easier to compile a list of reproducible ones…,” one user responded.

“Probably 50%-75% of all papers are unreproducible. It’s sad, but it’s true,” another user wrote. “Think about it, most papers are ‘optimized’ to get into a conference. More often than not the authors know that a paper they’re trying to get into a conference isn’t very good! So they don’t have to worry about reproducibility because nobody will try to reproduce them.”

A few other users posted links to machine learning papers they had failed to implement and voiced their frustration with code implementation not being a requirement in ML conferences.

The next day, ContributionSecure14 created “Papers Without Code,” a website that aims to create a centralized list of machine learning papers that are not implementable.

“I’m not sure if this is the best or worst idea ever but I figured it would be useful to collect a list of papers which people have tried to reproduce and failed,” ContributionSecure14 wrote on r/MachineLearning. “This will give the authors a chance to either release their code, provide pointers or rescind the paper. My hope is that this incentivizes a healthier ML research culture around not publishing unreproducible work.”

Reproducing the results of machine learning papers

Machine learning researchers regularly publish papers on online platforms such as arXiv and OpenReview. These papers describe concepts and techniques that highlight new challenges in machine learning systems or introduce new ways to solve known problems. Many of these papers find their way into mainstream artificial intelligence conferences such as NeurIPS, ICML, ICLR, and CVPR.

Having source code to go along with a research paper helps a lot in verifying the validity of a machine learning technique and building on top of it. But this is not a requirement for machine learning conferences. As a result, many students and researchers who read these papers struggle with reproducing their results.

“Unreproducible work wastes the time and effort of well-meaning researchers, and authors should strive to ensure at least one public implementation of their work exists,” ContributionSecure14, who preferred to remain anonymous, told TechTalks in written comments. “Publishing a paper with empirical results in the public domain is pointless if others cannot build off of the paper or use it as a baseline.”

But ContributionSecure14 also acknowledges that there are sometimes legitimate reasons for machine learning researchers not to release their code. For example, some authors may train their models on internal infrastructure or use large internal datasets for pretraining. In such cases, the researchers are not at liberty to publish the code or data along with their paper because of company policy.

“If the authors publish a paper without code due to such circumstances, I personally believe that they have the academic responsibility to work closely with other researchers trying to reproduce their paper,” ContributionSecure14 says. “There is no point in publishing the paper in the public domain if others cannot build off of it. There should be at least one publicly available reference implementation for others to build off of or use as a baseline.”

In some cases, even if the authors release both the source code and data to their paper, other machine learning researchers still struggle to reproduce the results. This can be due to various reasons. For instance, the authors might cherry-pick the best results from several experiments and present them as state-of-the-art achievements. In other cases, the researchers might have used tricks such as tuning the parameters of their machine learning model to the test data set to boost the results. In such cases, even if the results are reproducible, they are not relevant, because the machine learning model has been overfitted to specific conditions and won’t perform well on previously unseen data.

“I think it is necessary to have reproducible code as a prerequisite in order to independently verify the validity of the results claimed in the paper, but [code alone is] not sufficient,” ContributionSecure14 said.

Efforts for machine learning reproducibility