NEW SAVANNA: Another approach to meaning in large statistical language models, such as GPT-3

Thursday, October 1, 2020

Another approach to meaning in large statistical language models, such as GPT-3

I've recently been writing about meaning in AI engines, such as GPT-3, in particular, What economic growth and statistical semantics tell us about the structure of the world, and GPT-3: Waterloo or Rubicon? Here be Dragons. Scott Enderly is interested in the topic as well, from a different point of view.

Jonathan Scott Enderle, Toward a Thermodynamics of Meaning, arXiv:2009.11963v1 [cs.CL] 24 Sep 2020

Abstract: As language models such as GPT-3 become increasingly successful at generating realistic text, ques- tions about what purely text-based modeling can learn about the world have become more urgent. Is text purely syntactic, as skeptics argue? Or does it in fact contain some semantic information that a sufficiently sophisticated language model could use to learn about the world without any additional inputs? This paper describes a new model that suggests some qualified answers to those questions. By theorizing the relationship between text and the world it describes as an equilibrium relationship between a thermodynamic system and a much larger reservoir, this paper argues that even very simple language models do learn structural facts about the world, while also proposing relatively precise limits on the nature and extent of those facts. This perspective promises not only to answer questions about what language models actually learn, but also to explain the consistent and surprising success of cooccurrence prediction as a meaning-making strategy in AI.

1. Introduction

Since the introduction of the Transformer architecture in 2017 [1], neural language models have developed increasingly realistic text-generation abilities, and have demonstrated impressive performance on many downstream NLP tasks. Assessed optimistically, these successes suggest that language models, as they learn to generate realistic text, also infer meaningful information about the world outside of language.

Yet there are reasons to remain skeptical. Because they are so sophisticated, these models can exploit subtle flaws in the design of language comprehension tasks that have been over- looked in the past. This may make it difficult to realistically assess these models’ capacity for true language comprehension. Moreover, there is a long tradition of debate among linguists, philosophers, and cognitive scientists about whether it is even possible to infer semantics from purely syntactic evidence [2].

This paper proposes a simple language model that directly addresses these questions by view- ing language as a system that interacts with another, much larger system: a semantic domain that the model knows almost nothing about. Given a few assumptions about how these two systems relate to one another, this model implies that some properties of the linguistic system must be shared with its semantic domain, and that our measurements of those properties are valid for both systems, even though we have access only to one. But this conclusion holds only for some properties. The simplest version of this model closely resembles existing word embeddings based on low-rank matrix factorization methods, and performs competitively on a balanced analogy benchmark (BATS [3]).

The assumptions and the mathematical formulation of this model are drawn from the statistical mechanical theory of equilibrium states. By adopting a materialist view that treats interpretations as physical phenomena, rather than as abstract mental phenomena, this model shows more precisely what we can and cannot infer about meaning from text alone. Additionally, the mathematical structure of this model suggests a close relationship between cooccurrence prediction and meaning, if we understand meaning as a mapping between fragments of language and possible interpretations. There is reason to believe that this line of reasoning will apply to any model that operates by predicting cooccurrence, however sophisticated. Although the model described here is a pale shadow of a hundred-billion-parameter model like GPT-3 [4], the fundamental principle of its operation, this paper argues, is the same.

To be presented at CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands.