David Chapman posted the following tweet this morning (the first in a stream):
🗣 In 1974, Joseph D. Becker pointed out that rigid rationalist Chomskian linguistics was an emperor without clothes, and explained how syntax actually works.— David Chapman (@Meaningness) July 9, 2022
Rigorously ignored for decades, his theory seems powerfully confirmed by current AI text generators. pic.twitter.com/S6brDSsmVd
So I went looking for Becker’s paper and found it rather quickly:
Joseph D. Becker, The phrasal lexicon, TINLAP '75: Proceedings of the 1975 workshop on Theoretical issues in natural language processing, June 1975 Pages 60-63, https://doi.org/10.3115/980190.980212
It currently has 928 citations in Google Scholar, which is certainly a respectable number.
Here’s the abstract:
Theoretical linguists have in recent years concentrated their attention on the productive aspect of language, wherein utterances are formed combinatorically from units the size of words or smaller. This paper will focus on the contrary aspect of language, wherein utterances are formed by repetition, modification, and concatenation of previously-known phrases consisting of more than one word. I suspect that we speak mostly by stitching together swatches of text that we have heard before; productive processes have the secondary role of adapting the old phrases to the new situation. The advantage of this point of view is that it has the potential to account for the observed linguistic behavior of native speakers, rather than discounting their actual behavior as irrelevant to their language. In particular, this point of view allows us to concede that most utterances are produced in stereotyped social situations, where the communicative and ritualistic functions of language demand not novelty, but rather an appropriate combination of formulas, cliches, idioms, allusions, slogans, and so forth. Language must have originated in such constrained social contexts, and they are still the predominant arena for language production. Therefore an understanding of the use of phrases is basic to the understanding of language as a whole. You are currently reading a much-abridged version of a paper that will be published elsewhere later.
I also found a later paper that builds on Becker’s idea:
Uri Zernik, Michael G. Dyer, The self-extending phrasal lexicon, Computational Linguistics, Volume 13, Issue 3-4, July-December 1987, pp 308–327, https://dl.acm.org/doi/10.5555/48160.48169
Here’s the abstract:
Lexical representation so far has not been extensively investigated in regard to language acquisition. Existing computational linguistic systems assume that text analysis and generation take place in conditions of complete lexical knowledge. That is, no unknown elements are encountered in processing text. It turns out however, that productive as well as non-productive word combinations require adequate consideration. Thus, assuming the existence of a complete lexicon at the outset is unrealistic, especially when considering such word combinations.
Three new problems regarding the structure and the contents of the phrasal lexicon arise when considering the need for dynamic acquisition. First, when an unknown element is encountered in text, information must be extracted in spite of the existence of an unknown. Thus, generalized lexical patterns must be employed in forming an initial hypothesis, in absence of more specific patterns. Second, senses of single words and particles must be utilized in forming new phrases. Thus the lexicon must contain information about single words, which can then supply clues for phrasal pattern analysis and application. Third, semantic clues must be used in forming new syntactic patterns. Thus, lexical entries must appropriately integrate syntax and semantics.
We have employed a Dynamic Hierarchical Phrasal Lexicon (DHPL) which has three features: (a) lexical entries are given as entire phrases and not as single words, (b) lexical entries are organized as a hierarchy by generality, and (c) there is not separate body of grammar rules: grammar is encoded within the lexical hierarchy. A language acquisition model, embodied by the program RINA, uses DHPL in acquiring new lexical entries from examples in context through a process of hypothesis formation and error correction. In this paper we show how the proposed lexicon supports language acquisition.
This reminds me of Daniel Kahneman’s distinction between System 1 thinking and System 2. System 1 is “Fast, automatic, frequent, emotional, stereotypic, unconscious” (from the Wikipedia entry) while System 2 is “slow, effortful, infrequent, logical, calculating, conscious.” The phrasal lexicon characterizes the organization of language in System 1. When operating in System 2 language has recourse to a detailed grammar as needed, though not, I’m pretty sure, a grammar of the type envisioned by Chomsky.
I would think that oral literature is organized in this manner. Isn’t that the import of the Milman Parry’s thesis these about the Homeric tales, as set forth in Alfred Lord’s The Singer of Tales (1960)? Is this what free-styling hip hop rappers do? It has to be. Is it time for me to re-read David Rubin, Memory in Oral Tradition: The Cognitive Psychology of Epic, Ballads, and Counting-out Rhymes (Oxford UP 1995). For that matter, it looks like I may now have a reason to read Kahneman’s Thinking Fast, Thinking Slow (2011).
Going back to Chapman’s original remark about current AI text generators, as powerful and important as they are, I certainly don’t think they capture all of language – see my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, Version 4.1. If you will, they don’t capture the System 2 aspect. Is there some way the System 2 aspect can be “bootstrapped in” without an enormous effort of hand-coding? There must be. When it is found, the quest for artificial minds will undergo yet another phase change.
Addendum 7.15.28: See this post from 2021, Think of GPT-3 as System 1. Now augment it with a symbolic System 2. [Dual-System, Neuro-Symbolic Reasoning].