GPT-3 and Homer? You’ve got to be kidding. No, I’m not. What’s the connection?
First, I look at large language models, mostly through tweets by David Chapman, who suggests that we examine Joseph Becker’s idea of the phrasal lexicon (as a Heideggerian ready-to-hand of speech communication). I then consider the Parry/Lord argument about the oral nature of the Homeric poems, and conclude with some discussion of what Homer knew that GPT-3 does not and likely cannot.
Large Language Models as Text Generators
A couple of days ago David Chapman posted a tweet in which he noted: “In 1974, Joseph D. Becker pointed out that rigid rationalist Chomskian linguistics was an emperor without clothes, and explained how syntax actually works. Rigorously ignored for decades, his theory seems powerfully confirmed by current AI text generators.” The post was the first in a string that contained screen shots from Becker’s paper. In a later series of tweets he said:
Your already-said is a resource available in the context for coming up with the next bit you are going to say, along with a stock of routine flat language patterns.
This is a central insight of ethnomethodological conversation analysis.
It’s also how current AI models work…
My expectation is that sciencing the early layers of current AI language models will reveal an enormous repertoire of cliches, idioms, parameterized “mad libs” patterns, stock phrases, joke patterns, song lyrics, snowclones, etc., as Becker suggested…
And plausibly a mid layer will include thousands of language pattens for stitching together two otherwise independent bits of text.
Since DL networks are quite unlike brains, verifying this would tell us very little about “intelligence,” but a great deal about language!
I was intrigued and went looking for Becker’s paper. Once I found it I took a quick look. SHAZAM! a light went on. I ran up a quick post, The phrasal lexicon [System 1 vs System 2 for language], where I mentioned Parry/Lord, and hence Homer, at the end, but without discussion. This is that discussion, or the beginning of it.
The phrasal lexicon
Let’s take a look at Becker’s paper:
Joseph D. Becker, The phrasal lexicon, TINLAP '75: Proceedings of the 1975 workshop on Theoretical issues in natural language processing, June 1975 Pages 60-63, https://doi.org/10.3115/980190.980212.
He begins by noting that that all languages contain phrases known as idioms, such as, 1) “raining cats and dogs” or 2) “see the light,” – examples from a dictionary, not Becker. We do not understand such phrases by composing the meaning of the components into the meaning of the whole. The first has nothing to do with either cats or dogs, though it often, but not always, is about rain, while the second is unrelated either to our sense or vision or sources of photons in the visual range. We grasp the meaning of such phrases whole. Though they have the formal properties of phrases, semantically they are single lexical.
Becker proposes that our phrasal lexicon, rather than being a smattering of idioms here and there, is in fact quite pervasive and is central to speech. The phrasal lexicon is central to the real-time, rough and ready, business of conversational interaction. He goes on to list six classes of such phrases (p. 61):
Class I – Polywords: “oldest profession”
Class II – Phrasal constraints: “by pure coincidfence”
Class III – Deitic locutions: “for that matter”
Class IV – Sentence builders: “(person A) gave (person B) a (long) song and dance about (a topic)”
Class V – Situational Utterances: “How can I ever repay you?”
Class VI – “Better late than never”
He then goes on to sketch out the implications for an account of language production (p. 62):
It implies to me that the process of speaking is Compositional: We start with the information we wish to convey and the attitudes toward that information that we wish to express or evoke, and we haul out of our phrasal lexicon some patterns that can provide the major elements of this expression. Then the problem is to stitch these phrases together into something roughly grammatical, to fill in the blanks with the particulars of the case at hand, to modify the phrases if need be, and if all else fails to generate phrases from scratch to smooth over the transitions or fill in any remaining conceptual holes.
My guess is that phrase-adaption and generative gap-filling are very roughly equally important in language production, as measured in processing time spent on each, or in constituents arising from each.
Later:
All in all, we must conclude that the phrasal lexicon is very real. Even excluding long verbatim texts, we probably know as many or more whole phrases than we know single words (and I suspect the disparity would be even greater for the under-educated, i.e. almost all of humanity, since book-learning adds more words but few social situations to the individual’s experience.
He goes on to list 37 such phrases used in the paper itself and after a bit of discussion he notes (p. 63):
Second, nothing in this paper says that so-called “generative” processes do not play an important role in language production. I assert that their role is equal to or less than that of phrasal processes, but that does not make it zero.
Thus he’s not suggesting that we toss syntax-based grammars out the window, not at all. They are important, but are not the only thing, as standard linguistic theorizing has assumed. Such rules, he suggests, are more important in writing, noting that we only learn to write well after we’ve been speaking fluently, that such learning “takes us years of strenuous effort” and “most of us never learn to write very well at that.”
Oral epic
And that brings us to the Homeric texts, Iliad and Odyssey, which have come down to us in written form. The basic problem of these texts is that we don’t know who created them or how. “Homer” is just a name that’s attached to the texts, but not to anything else. Moreover, they are repetitive, inconsistent, involve multiple dialects, seem like they were multiply authored, and maybe they were oral and maybe written, and so forth and so on. Coming on into the middle of the 20th century scholars didn’t know what to make of them.
This is certainly not the time and place to go into all of that. But Albert B. Lord laid it all out in a modern classic, The Singer of Tales (1960), which is available online (though you have to move through it either page by page or with an unmarked scroll bar without flexible independent access to different chapters). Lord in turn was following a line of investigation initiated by his teacher, Milman Parry. Their work is based on two things: 1) fieldwork in contemporary oral epics in the Balkans, and 2) painstaking analysis of the Homeric texts. The upshot is that they clarified the nature of oral composition and performance and determined that, yes, the Homeric texts had originated and evolved in an oral culture. There was no Homer; there was only a tradition. Just how the texts came to be written down, that we do not know, but we do know that they did originate as oral texts.
One of the strongest arguments against the idea that they are oral is that they are very long, with Iliad being almost 16,000 lines long and Odyssey about 12,000 lines. How could anyone possibly memorize that much? was the question. Wrong question, for it implies a fundamentally literate conception of the text, a conception that doesn’t apply in oral cultures.
What is this literate conception? On what does the identity of a text depend? If I have a text of, say, The Wizard of Oz, what test must that text have to pass in order to qualify as being a valid text of The Wizard of Oz? It must be word-for-word identical to the original text, whatever that has been determined to be. As long as all the words are there and in the right order it doesn’t matter how many pages they’re printed on, it doesn’t matter what font they’re printed in, the color of ink is irrelevant, it doesn’t matter whether or not it is nicely bound in boards or paper. None of that determines identity. Identity is determined solely by having the proper words in the proper order and, of course, with the proper punctuation.
Given that conception of identity, memorizing texts such as Iliad and Odyssey would be a prodigious feat. But – think about it – if the texts didn’t originate on the written page, then how could anyone tell whether or not one particular telling was word-for-word identical to some Platonic urtext? Just what are you checking the performance against? Your memory of the text? How do we know that your memory is accurate?
No, the identity condition of an oral text simply is not the same as it is for a written text. The words change from one performance to another but the characters and their actions and interactions are invariant. That’s where identity is lodged.
Lord tells us (p. 63):
Stated briefly, oral epic song is narrative poetry composed in a manner evolved over many generations by singers of tales who did not know how to write; it consists of the building of metrical lines and half lines by means of formulas and formulaic expressions and of the building of songs by the use of themes. This is the technical sense in which I shall use the word "oral" and "oral epic" in this book. By formula I mean "a group of words which is regularly employed under the same metrical conditions to express a given essential idea." This definition is Parry's. By formulaic expression I denote a line or half line constructed on the pattern of the formulas. By theme I refer to the repeated incidents and descriptive passages in the songs. [...]
We shall see that in a very real sense every performance is a separate song; for every performance is unique, and every performance bears the signature of its poet singer. He may have learned his song and the technique of its construction from others, but good or bad, the song produced in performance is his own. The audience knows it as his because he is before them. The singer of tales is at once the tradition and an individual creator. His manner of composition differs from that used by a writer in that the oral poet makes no conscious effort to break the traditional phrases and incidents; he is forced by the rapidity of composition in performance to use these traditional elements. To him they are not merely necessary, however; they are also right.
It is the presence and pervasive use of those “formulas and formulaic expressions” that warrants the comparison with Becker’s phrasal lexicon. They may consist of several words, but they function as single units in the poem.
Lord devotes a great deal of attention to the poets he and Parry worked with in the Balkans (the online text includes recordings), for that’s where he gathered evidence about the procedures of improvising oral epic – for that is what is involved, improvisation, improvisation constrained by the requirements of the story. He doesn’t even get to Homer until chapter seven, of ten, with the tenth chapter being devoted to Medieval French and English epic.
Most of Lord’s exposition is about larger structural patterns rather than an examination of formulaic phrases and lines – he cites Parry’s work on that – and so is not directly to Becker’s point. Nor is this the time and place to analyze or summarize that work. But I would like to address a general question: What’s in a story, any story, not just classical Greek epics, beyond words and phrases?
Beyond words and phrases
What’s beyond words and phrases? The stories themselves, which consist of characters acting and interacting. Those are not defined by the words and phrases through which they are realized. Literary critics and, anthropologists, folklorists have devoted considerable work to identifying the “grammar” of stories, going back to Vladimir Propp’s Morphology of the Folktale early in the 20th century. During the “classical” era of symbolic computation, students of artificial intelligence and computational linguistics worked of story grammars, and this work continues to this day, often in connection with computer gaming – for a recent review, see Mark Riedel, An Introduction to AI Story Generation (which I discuss here).
Does GPT-3, or any other large language model (LLM), know anything about characters and actions, about plots? I suppose that the judicious answer would be: We don’t know. But I strongly suspect that, no, it does not. Why do I believe that? Because it’s not trained on characters, actions, plot points, and story arcs. It’s trained on words. Those things are realized through words, implemented by means of words, but they are not defined or characterized by words.
The notion of implementation is crucial. Word processing programs, for example, have various functions: cut and paste, outlining, formatting, and so on. These functions have to be expressed though routines written in some high-level computing language, but it doesn’t matter which one; moreover a given function can expressed in various ways. These are two levels of organization, domain-level function and execution-level operations.
Stories are like that. Characters, motivation, actions, plot points, those are domain-level functions. Specific words and phrases are execution level operations. What does word-level training of LLMs allow them model to ‘learn’ about domain-level characters, actions, etc.? One extreme answer is: Nothing. Another extreme answer would be: everything, provided the corpus is large enough and the model has sufficient parameters of the appropriate architecture. I suspect the answer is closer to nothing than to everything.
We know that the text generation prowess of LLMs is limited to relatively short texts. While it is reasonable to expect that allowing for wider context during training will increase the effective length of generated texts, I doubt that that is sufficient to support the creation of coherent stories even a quarter or a tenth of the length of the Homeric texts. Getting texts of that length would require that an explicit story grammar be coupled to a language generation module. Independently of the problem of linking a story grammar to a text generator (which surely is being done), I doubt that we have story grammars capable of Homeric richness.
This is not, I realize, a complete argument. It is only the skeleton of one. That will have to suffice for now.
I did quite a lot of epic memorization once upon a time (mostly Dante & Homer), but I think I was doing it wrong because I was focussed more on the words than the story. & my sister used to narrate the story of the Lord of the Rings to her daughter in sections as a bedtime story, both book and movie version; if she had been doing it in some metrical form, she would have been acting as an oral poet, and the metrical rules would have stabilized the form, to some degree. Also, Chris Gregory of the ANU has done work on a South Asian rice epic (long poems performed in while processing rice to make it less boring); he reckoned that the main reciter had about 80k short lines under control, roughly equivalent to 40k long ones, not only the epic but other material such as wedding songs.
ReplyDeleteA final observation is that the grammar almost certainly helps to organize and make it easier to remember & use the phrasal lexicon as well as stitch together the parts. I have occasionally spent some time contemplating verb-partical constructions such as put up/down/away etc, and all of these have multiple uses, ranging from extreme noncompositionality ('put somebody' up in British English) to mildly subtle noncompositionality ('put something down' could be either leave it anywhere roughly lower than chest height. E.g. 'put that down' -> anywhere, I don't care where, as long as you are not holding it anymore, or, into a standard lower place where it is normally stored (less common I think, but frequent at our house because my wife has trouble accessing the lower storage spaces and often asks me to put thing there).
But in all these cases and many, the particle seems best after the verb, especially when the object is human.
Thanks, Avery. FWIW David Rubin has written an interesting monograph, Memory in Oral Traditions, where he studies the various devices used to stabilize the form and thus facilitate memorizing.
ReplyDeleteI know in jazz we talk about riffs, short stock phrases that get used and reused in improvising solos.