Things are beginning to fall in place. I’m getting a feel for ChatGPT, and by implication, for deep learning. And by “feel” I mean just that, a feel, a feeling for, intuition. I’m beginning to get a sense of what’s going on.
On this I’m a Piagetian. He argued that learning involves the interaction between two ‘movements of the mind’ if you will. In accommodation you change your mind to fit the phenomena you’re learning. That’s the learning part. But in order to do that, you have to figure out how to assimilate the phenomena to things you already know.
I already know quite a bit about “classical” symbolic approaches to computer modeling of language and the mind. One of my earliest publications was a cognitive network model of Shakespeare’s Sonnet 129, Cognitive Networks and Literary Semantics (1976), and I completed a dissertation on the subject two years after that. I refined the work I’d done on Sonnet 129 and offered some remarks – a chapter actually – on the long-scale development of cognitive structures in human history.
A decade later David Hays and I published two papers about the brain. One of them, Metaphor, Recognition, and Neural Process (1987) took a cue from work Karl Pribram had done earlier about holographic processing in the brain. Pretty much the same mathematics would turn up in Yann LeCun’s pioneering work on convolutional neural networks a couple years later, though I didn’t learn about it until only a couple of years ago (my interests were elsewhere). In the other paper, Principles and Development of Natural Intelligence (1988) – the title is a shot across the bow of artificial intelligence, Hays and I read a wide range of material in neuroscience, cognitive, developmental, and comparative psychology, evolution and came up with five principles underlying intelligent behavior in humans. Then, around the end of the previous century and into the first decade of this one, I had quite a bit of correspondence with Walter Freeman, who had once been a student of Pribram’s and was a pioneer in the complex dynamics of the brain. That informed by book on music, Beethoven’s Anvil (2001). A bit later I took Freeman’s dynamics, crossed it with a symbolic network model of Sydney’s Lamb’s and wrote up some notes on symbolic networks over attractor basins in the cerebral cortex.
That takes me into the early years of this century, though I didn’t post those notes until 2011. By that time things began jumping off in digital humanities so I busied myself with topic models and such. Then GPT-3 was released in 2020, forcing me to think seriously about deep learning. I didn’t have direct access to it, though I got a bit of indirect access through Phil Mohun, but I read a lot about it.
That brings us to these working papers, which I present along with their abstracts, but without other comment. But I need to point out one final thing. This whole ‘journey’ – to use a popular cliché – began with my interest in Coleridge’s “Kubla Khan,” the subject of my 1972 MA Thesis, THE ARTICULATED VISION: Coleridge's “Kubla Khan.” I published a considerably revised version of that reading in 1985. One of these papers revisits that subject in the context of deep learning and complex dynamics. I expect to return to that topic at some time in the future, though I do not know when. I have other work to do before that. I’m still working with and thinking about ChatGPT.
* * * * *
GPT-3: Waterloo or Rubicon? Here be Dragons, August 5, 2020 (Version 4.1 is the current version, May 7, 2022).
Abstract: GPT-3 is an AI engine that generates text in response to a prompt given to it by a human user. It does not understand the language that it produces, at least not as philosophers understand such things. And yet its output is in many cases astonishingly like human language. How is this possible? Think of the mind as a high-dimensional space of signifieds, that is, meaning-bearing elements. Correlatively, text consists of one-dimensional strings of signifiers, that is, linguistic forms. GPT-3 creates a language model by examining the distances and ordering of signifiers in a collection of text strings and computes over them so as to reverse engineer the trajectories texts take through that space. Peter Gärdenfors’ semantic geometry provides a way of thinking about the dimensionality of mental space and the multiplicity of phenomena in the world, about how mind mirrors the world. Yet artificial systems are limited by the fact that they do not have a sensorimotor system that has evolved over millions of years. They do have inherent limits.
Direct Brain-to-Brain Thought Transfer A High Tech Fantasy that Won't Work, September 17, 2020.
Abstract: Various thinkers (Rodolfo Llinás, Christof Koch, and Elon Musk) have proposed that, in the future, it would be possible to link two or more human brains directly together so that people could communicate without the need for language or any other conventional means of communication. These proposals fail to provide a means by which a brain can determine whether or not a neural impulse is endogenous or exogenous. That failure makes communication impossible. Confusion would the more likely result of such linkage. Moreover, in providing a rationale for his proposal, Musk assumes a mistaken view of how language works, a view cognitive linguists call the conduit metaphor. Finally, all these thinkers assume that we know what thoughts are in neural terms. We don’t.
To Model the Mind: Speculative Engineering as Philosophy, April 7, 2022.
Abstract: Are brains computers? Some say yes, some say no. Does it matter? Ideas about computing have certainly proven fruitful in understanding how brains give rise to minds. That’s what this paper is about. The central section is a review of Grace Lindsey’s wonderful book Models of the Mind: How Physics, Engineering, and Mathematics Have Shaped Our Understanding of the Brain (2021). I precede it with a bit of philosophy and follow it with brief notices about five books, each proposing computationally inspired models of the mind.
Symbols and Nets: Calculating Meaning in "Kubla Khan", May 11, 2022.
Abstract: This is a dialog between a Naturalist Literary Critic and a Sympathetic Techno-Wizard about the interaction of symbols and neural nets in understanding "Kubla Khan," which has an extraordinary structure. Each of two parts is like a matryoshka doll nested three deep, with the last line of the first part being repeated in the middle of the second. They start talking about traditional symbol processing, with addressable memory, and nested loops, and end up talking about a pair of interlinked neural nets where one (language forms) is used to index the other (meaning).
Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind, July 13, 2022.
Abstract: Miriam Yevick’s 1975 holographic logic suggests we need both symbols and networks to model the mind. I explore that premise by adapting Sydney Lamb’s relational network notation to represent a logical structure over basins of attraction in a collection of attractor landscapes, each belonging to a different neurofunctional area (NFA) of the cortex. Peter Gärdenfors provides the idea of a conceptual space, a low dimensional projection of the high-dimensional phase space of a NFA. Vygotsky’s account of language acquisition and internalization is used to show how the mind is indexed. We then define a MIND as a relational network of logic gates over the attractor landscape of a neural network loosely partitioned into many NFAs. An INDEXED MIND consists of a GENERAL network and an INDEXING network adjacent to and recursively linked to it. A NATURAL MIND is one where the substrate is the nervous system of a living animal. An ARTIFICIAL MIND is one where the substrate is inanimate matter engineered by humans to be a mind; it becomes AUTONOMOUS when it is able to purchase its compute with services rendered.
Discursive Competence in ChatGPT, Part 1: Talking with Dragons, January 5, 2022 (Version 2, January 11, 2023).
Abstract: Noam Chomsky’s idea of linguistic competence suggests a new approach to understanding how LLMs work. This approach requires careful analysis of text. Such analysis indicates that ChatGPT has explicit control over sophisticated discourse skills: 1) It possesses the capacity to specify high-level structures that regulate the organization of language strings into specific patterns: e.g. conversational turn-taking, story frames, film interpretation, and metalingual definition of abstract concepts. 2) It is capable of analogical reasoning in the interpretation of films and stories, such as Spielberg’s Jaws and A.I., and Tezuka’s Astro Boy stories. It must establish an analogy between some abstract interpretive theory (e.g. the ideas of Rene Girard) and people and events in a story. 3) It has some understanding of abstract concepts such as justice and charity. Such concepts can be defined over concepts that exhibit them (metalingual definition). ChatGPT recognizes suitable stories and can revise them. 4) ChatGPT can adjust its level of discourse to accommodate children of various ages. Finally, much of ChatGPT’s discourse seems formulaic in a way similar to what Parry/Lord found in oral epic.
ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking. January 24, 2023 (Version 3, 2023).
Abstract: I make three arguments. There is a philosophical argument that the behavior of ChatGPT is so sophisticated that the ordinary concept of thinking is no longer useful in distinguishing between human behavior and the behavior of advanced AI. We don’t have deep and explicit understanding about what either humans or advanced AI systems are doing. The other argument is about ChatGPT’s behavior. As a result of examining its output in a systematic way, short stories in particular, I have concluded that its operation is organized on at least two levels: 1) the parameters and layers of the LLN, and 2) higher level grammars, if you will, that are implemented in those parameters and layers. This is analogous to the way that high-level programming languages are implemented in assembly code. 3) Consequently, it turns out that aspects of symbolic computation are latent in LLMs. An appendix gives examples of how a story grammar is organized into frames, slots, and fillers.
ChatGPT tells stories, and a note about reverse engineering: A Working Paper, March 3, 2023.
Abstract: I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights off ChatGPT’s parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the key character and the one substituted for it. I conclude with a methodological coda: ChatGPT’s behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.
No comments:
Post a Comment