Tuesday, January 22, 2019

Formal limits on machine learning

Ashutosh Jogalekar, Open Borders, 3 Quarks Daily, Jan 21, 2019:
The continuum hypothesis is related to two different kinds of infinities found in mathematics. When I first heard the fact that infinities can actually be compared, it was as if someone had cracked my mind open by planting a firecracker inside it. There is the first kind of infinity, the “countable infinity”, which is defined as an infinite set that maps one-on-one with the set of natural numbers. Then there’s the second kind of infinity, the “uncountable infinity”, a gnarled forest of limitless complexity, defined as an infinity that cannot be so mapped. Real numbers are an example of such an uncountable infinity. One of the staggering results of mathematics is that the infinite set of real numbers is somehow “larger” than the infinite set of natural numbers. The German mathematician Georg Cantor supplied the proof of the uncountable nature of the real numbers, sometimes called the “diagonal proof”. It is like a beautiful gem that has suddenly fallen from the sky into our lap; reading it gives one intense pleasure.

The continuum hypothesis asks whether there is an infinity whose size is between the countable infinity of the natural numbers and the uncountable infinity of the real numbers. The mathematicians Kurt Gödel and – more notably – Paul Cohen were unable to prove whether the hypothesis is correct or not, but they were able to prove something equally or even more interesting; that the continuum hypothesis cannot be decided one way or another within the axiomatic system of number theory. Thus, there is a world of mathematics in which the hypothesis is true, and there is one in which it is false. And our current understanding of mathematics is consistent with both these worlds.

Fifty years later, the computational mathematicians have found a startling and unexpected connection between the truth or lack thereof of the continuum hypothesis and the idea of learnability in machine learning. Machine learning seeks to learn the details of a small set of data and make correlative predictions for larger datasets based on these details. Learnability means that an algorithm can learn parameters from a small subset of data and accurately make extrapolations to the larger dataset based on these parameters. The recent study found that whether learnability is possible or not for arbitrary, general datasets depends on whether the continuum hypothesis is true. If it is true, then one will always find a subset of data that is representative of the larger, true dataset. If the hypothesis is false, then one will never be able to pick such a dataset. In fact in that case, only the true dataset represents the true dataset, much as only an accused man can best represent himself.

This new result extends both set theory and machine learning into urgent and tantalizing territory. If the continuum hypothesis is false, it means that we will never be able to guarantee being able to train our models on small data and extrapolate to large data.
Davide Castelvecchi, Machine learning leads mathematicians to unsolvable problem, Nature, Jan 9, 2019:
In the latest paper, Yehudayoff and his collaborators define learnability as the ability to make predictions about a large data set by sampling a small number of data points. The link with Cantor’s problem is that there are infinitely many ways of choosing the smaller set, but the size of that infinity is unknown.

They authors go on to show that if the continuum hypothesis is true, a small sample is sufficient to make the extrapolation. But if it is false, no finite sample can ever be enough. This way they show that the problem of learnability is equivalent to the continuum hypothesis. Therefore, the learnability problem, too, is in a state of limbo that can be resolved only by choosing the axiomatic universe.

The result also helps to give a broader understanding of learnability, Yehudayoff says. “This connection between compression and generalization is really fundamental if you want to understand learning.”
And here's the research paper: Shai Ben-David, Pavel Hrubeš, Shay Moran, Amir Shpilka & Amir Yehudayoff, Learnability can be undecidable, Nature Machine Intelligence 1, 44–48 (2019):
Abstract: The mathematical foundations of machine learning play a key role in the development of the field. They improve our understanding and provide tools for designing new learning paradigms. The advantages of mathematics, however, sometimes come with a cost. Gödel and Cohen showed, in a nutshell, that not everything is provable. Here we show that machine learning shares this fate. We describe simple scenarios where learnability cannot be proved nor refuted using the standard axioms of mathematics. Our proof is based on the fact the continuum hypothesis cannot be proved nor refuted. We show that, in some cases, a solution to the ‘estimating the maximum’ problem is equivalent to the continuum hypothesis. The main idea is to prove an equivalence between learnability and compression.
And the conclusion:
The main result of this work is that the learnability of the family of sets ℱ∗ over the class of probability distributions 𝒫∗ is undecidable. While learning ℱ∗ over 𝒫∗ may not be directly related to practical machine learning applications, the result demonstrates that the notion of learnability is vulnerable. In some general yet simple learning frameworks there is no effective characterization of learnability. In other words, when trying to understand learnability, it is important to pay close attention to the mathematical formalism we choose to use.

How come learnability can neither be proved nor refuted? A closer look reveals that the source of the problem is in defining learnability as the existence of a learning function rather than the existence of a learning algorithm. In contrast with the existence of algorithms, the existence of functions over infinite domains is a (logically) subtle issue.

The advantages of the current standard definitions (that use the language of functions) is that they separate the statistical or information-theoretic issues from any computational considerations. This choice plays a role in the fundamental characterization of PAC learnability by the VC dimension. Our work shows that this set-theoretic view of learnability has a high cost when it comes to more general types of learning.

1 comment:

  1. I'm not going to mince words - these guys should b e locked up and send to tedcamp

    ReplyDelete