The mere fact that I’ve posted a substantial article, How ChatGPT tells stories, does not at all imply that I’ve stopped thinking about those issues. Not at all. The process of writing at then distributing an article is simple a device for bringing my thinking to a certain level of maturity. But the thinking continues.
Here are two further thoughts. The first is about conceptual spaces and might in fact find its way into a later version to some article, assuming I decided to produce one. The second is considerably more speculative and requires more work and, in any event, would go in a different kind of paper.
From a note to Gärdenfors on conceptual spaces
The procedure I have been using is derived from the analytical method Claude Lévi-Strauss employed in his magnum opus, Mythologiques. He started with one myth, analyzed it, and then introduced another one, very much like the first. But not quite. They are systematically different. He characterized the difference by a transformation – a term he took from algebraic group theory. He worked his way through hundreds of myths in this manner, each one derived from another by a transformation.
Here is what I have been doing: I give ChatGPT a prompt consisting of two things: 1) an existing story and 2) instructions to produce another story like it except for one change, which I specify. That change is, in effect, a way of triggering or specifying those “transformations” that Lévi-Strauss wrote about. What interests me are the ensemble of things that change along with the change I have specified. It some cases it’s quite striking, as you can see from the table in the article.
Though I mention your work in the paper, I don’t explore it. Now that the paper is (more or less) done, I’ve been thinking, and it seems to me that conceptual spaces provides a ‘natural’ way to account for the results of these experiments.
Most the times I directed ChatGPT to change the protagonist. But sometimes I focused on the antagonist. The protagonist I use in the source stories is princess Aurora. In one case the new protagonist was Prince Harry. Except for gender, these are similar people, requiring minimal changes in the new story. In another case, however, the protagonist was William the Lazy. Since ChatGPT is operating on the assumption that characters have an intrinsic ‘nature’ and their actions must follow from that nature. ChatGPT had to come up with a way that a Lazy man could defeat a dragon. That required more extensive changes in the new story. William the Lazy had to summon his knights and get them to do the work. Still more changes were required when I had to transform Princess Aurora into a giant chocolate milkshake. ChatGPT had no trouble doing it and the resulting story was quite different from the original. The whole mise-en-scène had changed.
So, let's create a conceptual space in which we place the protagonist of the original story and the protagonist of the new story. They will be at different positions in that space reflecting the fact that they have different values on the dimensions that define the space. Now, let’s take the difference between those positions and use that difference as an off-set that we apply to the whole story, thus shifting its trajectory is semantic space.
It’s probably not quite that simple. In the case of William the Lazy, I doubt that the shift if semantic space would automatically produce the act where he summons his knights. ChatGPT had to do a little work to come up with that. But on the whole it seems to me that this is the way to go.
In note that, in particular, this is quite different from what you would have to do if you used a story grammar based on symbolic systems. Though I never worked with story grammars, I was trained in symbolic systems (and analyzed a Shakespeare sonnet using a semantic network) and read the literature story grammars. I dare say none of them could have done that task, much less done it so easily and naturally. It would have required extensive machinery.
It seems to me that metaphor and analogy could be handled in a similar fashion.
System time steps in brains and GPTs
This is a comment I posted to the semiotic phyics post at LessWrong:
Have you thought of exploring the existing literature on the complex dynamics of nervous systems. It’s huge, but it does use the math you guys are borrowing from physics.
I’m thinking in particular of the work of the late Walter Freeman, who is a pioneer in the field. Toward the end of his career he began developing a concept of “cinematic consciousness.” As you know the movement in motion pictures is an illusion created by the fact the individual frames of the image are projected on the screen more rapidly than the mind can resolve them. So, while the frames are in fact still, they change so rapidly that we see motion.
First I’ll give you some quotes from Freeman’s article to give you a feel for his thinking (alas, you’ll have to read the article to see how those things connect up), then I’ll explain what that has to do with LLMs. The numbers are from Freeman’s article.
[20] EEG evidence shows that the process in the various parts occurs in discontinuous steps (Figure 2), like frames in a motion picture (Freeman, 1975; Barrie, Freeman and Lenhart, 1996).
[23] Everything that a human or an animal knows comes from the circular causality of action, preafference, perception, and up-date. It is done by successive frames of self-organized activity patterns in the sensory and limbic cortices. [...]
[35] EEG measurements show that multiple patterns self-organize independently in overlapping time frames in the several sensory and limbic cortices, coexisting with stimulus-driven activity in different areas of the neocortex, which structurally is an undivided sheet of neuropil in each hemisphere receiving the projections of sensory pathways in separated areas. [...]
[86] Science provides knowledge of relations among objects in the world, whereas technology provides tools for intervention into the relations by humans with intent to control the objects. The acausal science of understanding the self distinctively differs from the causal technology of self-control. "Circular causality" in self-organizing systems is a concept that is useful to describe interactions between microscopic neurons in assemblies and the macroscopic emergent state variable that organizes them. In this review intentional action is ascribed to the activities of the subsystems. Awareness (fleeting frames) and consciousness (continual operator) are ascribed to a hemisphere-wide order parameter constituting a global brain state. Linear causal inference is appropriate and essential for planning and interpreting human actions and personal relations, but it can be misleading when it is applied to microscopic- microscopic relations in brains.
Notice that Freeman refers to “a hemisphere-wide order parameter constituting a global brain state.” The cerebral cortex consists of 16B neurons, each with roughly 10K connections. Further, all areas of the cortex have connections with subcortical regions. That’s an awful-lot of neurons communicating in parallel in a single time step. As I recall from another article, these frames occur at a rate of 6-7 Hz.
The nervous system operates in parallel. I believe it is known that the brain exhibits a small world topology, so all neurons are within a relatively small number links from one another. Though at any moment some neurons will be more active than others, they are all active – the only inactive neuron is a dead neuron. Similarly, ANNs exhibit a high degree of parallelism. LLMs are parallel virtual machines being simulated by so-called von Neumann machines. The use of multiple cores gives a small degree of parallelism, but that’s quite small in relation to the overall number of parameters the system has.
I propose that the process of generating a single token in an LLM is comparable to a single “frame” of consciousness in Freeman’s model. All the parameters in the system are visited during a single time-step for the system. In the case of ChatGPT I believe that’s 175B parameters.
Thus the assertion that ChatGPT generates one token at a time, based on the previous string, while true, is terribly reductive and thus misleading. The appearance of a token is in fact more or less a side-effect of evolving a trajectory from the initial prompt.
No comments:
Post a Comment