Monday, August 9, 2021

An interesting formal argument on why machine learning will never be able to do language understanding (well)

Walid S. Saba, Machine Learning Won’t Solve Natural Language Understanding, The Gradient, August 7, 2021.

Saba opens by pointing out that when empirical methods were first explored and adopted in natural language computing, they were intended as solutions to relatively simple problems, not as a general approach to language understanding. However, he notes:

As McShane (2017) also notes, subsequent generations misunderstood this empirical trend that was motivated by finding practical solutions to simple tasks by assuming that this Probably Approximately Correct (PAC) paradigm will scale into full natural language understanding (NLU). As she puts it: “How these beliefs attained quasi-axiomatic status among the NLP community is a fascinating question, answered in part by one of Church’s observations: that recent and current generations of NLPers have received an insufficiently broad education in linguistics and the history of NLP and, therefore, lack the impetus to even scratch that surface.”

I’ve long suspected that something like that was involved. That is, younger practitioners simply lack deep intuitions about how thought and language are structured because they haven’t been exposed to the ideas of “classical” computational linguistics and cognitive science.

But that’s an aside. I’m interested in something else, a mathematical argument that machine learning (ML) techniques will never do very well at natural language understanding. In making this argument he calls on mathematical work by Shay Moran, Amir Yehudayoff and colleagues [2]. I note that those article are beyond my technical capacity and that I am relying on Saba’s formulation.

He begins by noting something that has long been known, that rarely is any natural language statement ever fully explicit. Much is left unsaid; Saba calls this the missting text problem [MTP]. This is one aspect of the well-known problem with common sense knowledge [3]. Natural language understanding necessarily involves, in effect, recovering the missing text, that is, the information in the text.

He then asserts, after having presented several examples:

The above discussion was (hopefully) a convincing argument that natural language understanding by machines is difficult because of MTP – that is, because our ordinary spoken language in everyday discourse is highly (if not optimally) compressed, and thus the challenge in “understanding” is in uncompressing (or uncovering) the missing text – while for us humans that was a genius invention for effective communication, language understanding by machines is difficult because machines do not know what we all know. But the MTP phenomenon is precisely why data-driven and machine learning approaches, while might be useful in some NLP tasks, are not even relevant to NLU. And here we present the formal proof for this (admittedly) strong claim:

The equivalence between (machine) learnability (ML) and compressibility (COMP) has been mathematically established. That is, it has been established that learnability from a data set can only happen if the data is highly compressible (i.e., it has lots of redundancies) and vice versa (see this article and the important article “Learnability can be Undecidable” that appeared in 2019 in the journal Nature [2]). While the proof between compressibility and learnability is quite technically involved, intuitively it is easy to see why: learning is about digesting massive amounts of data and finding a function in multi-dimensional space that ‘covers’ the entire data set (as well as unseen data that has the same pattern/distribution). Thus, learnability happens when all the data points can be compressed into a single manifold. But MTP tells us that NLU is about uncompressing. [...] machine learning is about discovering a generalization of lots of data into a single function. Natural language understanding, on the other hand, and due to MTP, requires intelligent ‘uncompressing’ techniques that would uncover all the missing and implicitly assumed text. Thus, machine learning and language understanding are incompatible – in fact, they are contradictory.

I am very sympathetic to this argument. I made a rather informal version of it in my working paper on GPT-3 [4].

Still, one might imagine that, in dealing with a system based on a truly massive text base, such as GPT-3 is, the information that is missing from any one text, and that is needed to fully comprehend it, is in fact somehow present in some other text in the learning corpus so that it is nonetheless present in the language model that is created. Whether or not it is present in a usable form, that is another matter. I note that in my little experiment involving GPT-3 and a Jerry Seinfeld bit, the model was able to come up with relevant information not present in the bit itself. It also made some hilarious mistakes [5].

Such intuitions as I have are not very sanguine about the prospects of ML ‘solving’ the problem of natural language understanding. I’m not, however, sure how much those intuitions are worth. But I warrant there worth as much as those of a ML researcher who has little or no knowledge of classical computational linguistics and cognitive science.

References

[1] Intuition has been a running topic at New Savanna, https://new-savanna.blogspot.com/search/label/intuition.

In this particular context, see these posts:

Cognitive Science, AI, and Intuition: Or, what’s a word?, April 2, 2021, https://new-savanna.blogspot.com/2021/04/cognitive-science-ai-and-intuition-or.html.

Computational linguistics & NLP: What’s in a corpus? – MT vs. topic analysis [#DH], September 3, 2018, https://new-savanna.blogspot.com/2018/09/computational-linguistics-nlp-whats-in.html.

[2] Ofir David, Shay Moran, Amir Yehudayoff, On statistical learning via the lens of compression, arXiv:1610.03592v2 [cs.LG] 30 Dec 2016, https://arxiv.org/abs/1610.03592.

Shai Ben-David1, Pavel Hrubes, Shay Moran, Amir Shpilka, and Amir Yehudayoff, Learnability can be undecidable, Nature Machine Intelligence, Vol. 1, January 2019, 44-48, sci-hub.se/10.1038/s42256-018-0002-3.

[3] Common sense knowledge is another common topic at New Savanna, https://new-savanna.blogspot.com/search/label/common%20sense%20knowledge.

See, for example, Some thoughts on why systems like GPT-3 will always have trouble with common sense knowledge, June 1, 2021, https://new-savanna.blogspot.com/2021/06/some-thoughts-on-why-systems-like-gpt-3.html.

[4] GPT-3: Waterloo or Rubicon? Here be Dragons, Working Paper, Version 2, August 20, 2020, 34 pp., https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_2.

[5] See the post, Analyze This! Screaming on the flat part of the roller coaster ride [Does GPT-3 get the joke?], May 7, 2021, https://new-savanna.blogspot.com/2021/05/analyze-this-screaming-on-flat-part-of.html.

No comments:

Post a Comment