I’ve been thinking about the problems that GPT-3 has in dealing with poetry [1,2,3,4]. That led me to a variety of thoughts, about statistical semantics and poetry, about prose fiction, and about statistical semantics more generally.
Caveat: This is pretty much off the top of my head, so it will be a bit crude.
Statistical semantics and poetry
And I came to this:
GPT-3 works by predicting the next word, but it assumes conditional probabilities are a the result of a single ordering function. Poetry isn’t like that. It has two ordering functions, one for form, or sound, and one for meaning.
That is, versification – by which I mean rhyme, meter, and all manner of sound patterning not directly subordinate to meaning and syntax – is one thing, and meaning is another. They are independent, but not completely so. Good poets deliberately play with the interaction between these two streams of verbal patterning.
In the particular case of Coleridge, he inter-related the two streams in one way in the conversation poems, and in a different way in “Kubla Khan”, something I argued at length in STC, Poetic Form, and a Glimpse of the Mind [5]. The conversation poems have a strong narrative thread and relaxed versification. “Kubla Khan” has a weak narrative thread, but rigorous and sophisticated versification.
How is an engine like GPT-3 going to learn this? How is it going to figure out that, for one kind of text, poetry, there are two quasi-independent streams of ordering, while for another kind of text, prose, there is only one?
But, is there only one stream for prose?
The form of prose fiction
In the case of prose fiction, is there a parallel to the dual streams characteristic of poetry? I’m inclined to think that, on general principle, there must be. But I don’t know off had what that would be.
Of course, we have narrative verse, such as the Homeric poems. They’ve got the dual streams characteristic of poetry. But what of prose narrative? We of course have the distinction between plot and story, which I believe serves my theoretical purpose. And we’ve got ring composition which is the same sort of thing – at least I believe it is.
But there must be more. What?
Language more generally
Let’s think more generally of language. For a variety of particular texts, the same meaning could be conveyed in a different way. Perhaps we can think of ordinary speech or prose as involving an optimization function over several quasi-independent considerations – and I believe that linguists look at this sort of thing at both the sentence and discourse level. So that single stream of conditional probability characteristic of statistical semantics is an enabling methodological fiction. And that fiction may be one source of the ‘stiffness’ of statistical language models.
When a model is developed over a very large corpus many alternative forms of word-to-word development will be learned. The model will thus have some capacity for paraphrase and summarization. But this capacity, this limited flexibility, comes at the expense of considerable redundancy in the language model. Abstractly considered, it should be possible to eliminate much of that redundancy by factoring the model into independent components [6]. In practice, however, there is almost no way of doing this.
I seem to be left with a conclusion that I’d arrived at before, though through a different route: that corpus models are crippled in two ways: 1) they don’t have access to the physical world, and 2) they lack the flexibility of a propositionally structured symbolic system:
What Searle misses, though, is the way in which meaning is a function of relations among concepts, as I pointed out earlier (pp. 18 ff.). It seems to me, however – and here I’m just making this up – we can think of meaning as having both an intentional aspect, the connection of signs to the world, and a relational aspect, the relations of signs among themselves. Searle’s argument concentrated on the former and said nothing about the latter.
What of the intentional aspect when a person is writing or talking about things not immediately present, which is, after all quite common? In this case the intentional aspect of meaning is not supported by the immediate world. Language use thus must necessarily be driven entirely by the relations signifiers have among themselves, Sydney Lamb’s point which we have already investigated (p. 18). [7]
[1] William Benzon, An Electric Conversation with Hollis Robbins on the Black Sonnet Tradition, Progress, and AI, with Guest Appearances by Marcus Christian and GPT-3, Working Paper, July 20, 2020, 13 pp. https://www.academia.edu/43668403/An_Electric_Conversation_with_Hollis_Robbins_on_the_Black_Sonnet_Tradition_Progress_and_AI_with_Guest_Appearances_by_Marcus_Christian_and_GPT_3.
[2] William Benzon, GPT-3 writes two sonnets, sorta’. The first is better than the second. [digital humanities], New Savanna, July 21, 2020, https://new-savanna.blogspot.com/2020/07/gpt-3-writes-two-sonnets-sorta-first-is.html.
[3] William Benzon, Reflections on and current status of my GPT-3 project, New Savanna, Aug. 12, 2020, https://new-savanna.blogspot.com/2020/08/reflections-on-and-current-status-of-my.html.
[4] William Benzon, GPT-3 meets “Kubla Khan” and the results are interesting, but not encouraging for AI poetry, blog post, New Savanna, Sept. 1, 2020, https://new-savanna.blogspot.com/2020/09/gpt-3-meets-kubla-khan-and-results-are.html.
[5] William Benzon, STC, Poetic Form, and a Glimpse of the Mind, Working Paper, November 2013, 45 pp. https://www.academia.edu/8139268/STC_Poetic_Form_and_a_Glimpse_of_the_Mind.
[6] I wonder if the compression scheme in the following paper is taking advantage of this, Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, Kris M. Kitani, N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning, arXiv:1709.06030v2 [cs.LG], https://arxiv.org/abs/1709.06030v2.
[7] William Benzon, GPT-3: Waterloo or Rubicon? Here be Dragons, Working Paper, Version 2, Working Paper, August 20, 2020, p. 29, https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_2.
No comments:
Post a Comment