Friday, December 13, 2019

Why does deep learning work?


As I understand this piece from MIT's Technology Review, the answer is simple: because that's how the universe is:
Now Lin and Tegmark say they’ve worked out why. The answer is that the universe is governed by a tiny subset of all possible functions. In other words, when the laws of physics are written down mathematically, they can all be described by functions that have a remarkable set of simple properties.

So deep neural networks don’t have to approximate any possible mathematical function, only a tiny subset of them.

To put this in perspective, consider the order of a polynomial function, which is the size of its highest exponent. So a quadratic equation like y=x2 has order 2, the equation y=x24 has order 24, and so on.

Obviously, the number of orders is infinite and yet only a tiny subset of polynomials appear in the laws of physics. “For reasons that are still not fully understood, our universe can be accurately described by polynomial Hamiltonians of low order,” say Lin and Tegmark. Typically, the polynomials that describe laws of physics have orders ranging from 2 to 4.
And so:
“We have shown that the success of deep and cheap learning depends not only on mathematics but also on physics, which favors certain classes of exceptionally simple probability distributions that deep learning is uniquely suited to model,” conclude Lin and Tegmark.

That’s interesting and important work with significant implications. Artificial neural networks are famously based on biological ones. So not only do Lin and Tegmark’s ideas explain why deep learning machines work so well, they also explain why human brains can make sense of the universe. Evolution has somehow settled on a brain structure that is ideally suited to teasing apart the complexity of the universe.

This work opens the way for significant progress in artificial intelligence. Now that we finally understand why deep neural networks work so well, mathematicians can get to work exploring the specific mathematical properties that allow them to perform so well. “Strengthening the analytic understanding of deep learning may suggest ways of improving it,” say Lin and Tegmark.
Color me just a bit sceptical.

The research paper: Henry W. Lin, Max Tegmark, David Rolnick, Condensed Matter > Disordered Systems and Neural Networks, Why does deep and cheap learning work so well?, arXiv:1608.08225v4 [cond-mat.dis-nn]
We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that n variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.

No comments:

Post a Comment