Gregory Barber, Artificial Intelligence Confronts a 'Reproducibility' Crisis, Wired, 9.16.19:
When Facebook attempted to replicate AlphaGo, the system developed by Alphabet’s DeepMind to master the ancient game of Go, the researchers appeared exhausted by the task. The vast computational requirements—millions of experiments running on thousands of devices over days—combined with unavailable code, made the system “very difficult, if not impossible, to reproduce, study, improve upon, and extend,” they wrote in a paper published in May. (The Facebook team ultimately succeeded.)
The problem is widespread:
Neural networks, the technique that’s given us Go-mastering bots and text generators that craft classical Chinese poetry, are often called black boxes because of the mysteries of how they work. Getting them to perform well can be like an art, involving subtle tweaks that go unreported in publications. The networks also are growing larger and more complex, with huge data sets and massive computing arrays that make replicating and studying those models expensive, if not impossible for all but the best-funded labs.“Is that even research anymore?” asks Anna Rogers, a machine-learning researcher at the University of Massachusetts. “It’s not clear if you’re demonstrating the superiority of your model or your budget.” [...]It’s one thing to marvel at the eloquence of a new text generator or the “superhuman” agility of a videogame-playing bot. But even the most sophisticated researchers have little sense of how they work. Replicating those AI models is important not just for identifying new avenues of research, but also as a way to investigate algorithms as they augment, and in some cases supplant, human decision-making—everything from who stays in jail and for how long to who is approved for a mortgage.
Reproducibility is hard:
“Starting where someone left off is such a pain because we never fully describe the experimental setup,” says Jesse Dodge, an AI2 researcher who coauthored the research. “People can’t reproduce what we did if we don’t talk about what we did.” It’s a surprise, he adds, when people report even basic details about how a system was built. A survey of reinforcement learning papers last year found only about half included code.Sometimes, basic information is missing because it’s proprietary—an issue especially for industry labs. But it’s more often a sign of the field’s failure to keep up with changing methods, Dodge says. A decade ago, it was more straightforward to see what a researcher changed to improve their results. Neural networks, by comparison, are finicky; getting the best results often involves tuning thousands of little knobs, what Dodge calls a form of “black magic.” Picking the best model often requires a large number of experiments. The magic gets expensive, fast.
What's the point?
The point of reproducibility, according to Dodge, isn’t to replicate the results exactly. That would be nearly impossible given the natural randomness in neural networks and variations in hardware and code. Instead, the idea is to offer a road map to reach the same conclusions as the original research, especially when that involves deciding which machine-learning system is best for a particular task.