Saturday, February 4, 2023

Some miscellaneous thoughts on ChatGPT, stories, and mechanical interpretability

I started playing with ChatGPT on December 1, 2023 and made my first post on December 3. Including this post, I’ve made 54 posts about ChatGPT. Nine of those posts report the work of others (45), perhaps with a comment or two from me, and the other report my own experiments and observations. Some of those posts can be quite long, over 2000 words, in part because I’m including examples of output from ChatGPT. I suspect that’s the longest run of work on a single subject since I’ve been posting to New Savanna, which started in April 2010.

It's time to ramble through some things just to get them on the table

Where do things stand with the tantalizing working paper?

I’m not sure. I uploaded the first version on Jan. 23, the second version on Jan. 28, and then, the next day, decided I needed to revise it yet again. I uploaded a long post on Thursday (Feb 2) in which I thought things through, ChatGPT: Tantalizing afterthoughts in search of story trajectories [induction heads]. I’m still thinking them through.

Do I make a minimal revision for Version 3 and upload it ASAP? I do I try to do a bit better than that? At this moment it’s not clear to me just what a minimal revision would be. Perhaps I could add a paragraph or two up front saying that I’ve decided to drop the idea of a story grammar in favor of the idea of a story trajectory, but that a proper accounting for the change is more than is sensible for this paper.

Meanwhile I’m working on a relatively short paper in which I publish a number of before-and-after-tables like I did in A note about story grammars in ChatGPT. That note had four tables. The short paper I’m working on would have ten or more tables, plus light commentary on them.

We’ll see.

Mechanistic interpretability

The need to change things was brought on by looking at a very interesting paper by some researchers at Anthropic, In-Context Learning and Induction Heads, which I discussed in Thursday’s tantalizing post (link above). They are working on mechanistic interpretability. What is going on in the neural net to produce the behavior we observe?

By working with ‘toy model’ transformers (with only two layers) these researchers were able to identify a mechanism that they call induction heads. Judging from what they say about that mechanism, it seems that induction heads are responsible for performing the story generation task I’ve been exploring. The idea is to have ChatGPT take a story which I give it and create a new one by changing the protagonist (or antagonist). The induction head mechanism then traverses the old story, inserts the requested change, and continues the traversal, making further changes as needed to preserve internal consistency.

So, those before-and-after-tables I’ve been producing are observations about ChatGPT’s behavior. What inferential and experimental steps will be required to understand those observations in terms of activity in the neural net? I’m certainly not in a position to answer that question. Nor, I suspect, would the researchers at Anthropic. But that’s what interests me, the ‘territory’ between that kind of observation and what we currently know about how these models work.

I think there’s a lot of work that can be done making systematic observations about ChatGPT’s behavior, or the behavior of any LLM. What’s the best way to do that? What kind of observations would be most useful to people working on mechanical interpretability?

Origins: The first story or the key myth

And that brings me back to those story experiments. One point I’ve been making is that whatever it is that ‘defines’ what a story is, it is something working at a ‘higher level’ than sentence syntax. It is something working on the sentence-generation mechanism. How do we characterize this?

I’d been playing with the idea that we think of sentence-level syntax as analogous to assembly language and the story mechanism as being analogous to a high-level programming language. But, for reasons I explained in Thursday’s tantalizing post, I think the analogy is misleading.

Now, given an existing story, I can see how the induction head mechanism can create a new story. That would seem to obviate the need for some special ‘higher level’ mechanism – pending, of course, on how induction heads would work in this case. That’s all well and good for the experiments I’ve been running.

But where’d the ‘first’ story come from? If you prompt ChatGPT, tell me a story, it will do so. Where did that story come from? Is ChatGPT deriving it from some hidden story? Perhaps, but perhaps not. I assume that the word “story” tells it to generate a certain kind of trajectory through its activation space. How is that trajectory characterized?

That’s a question for the mechanical interpretability folks.

And it is roughly analogous to a methodological problem Lévi-Strauss faced at the beginning of The Raw and the Cooked, which is where I got the idea for my before-and-after experiments. He designated one myth as the key myth and worked his way from there, making pairwise comparisons between myths and noting what is different between myths in the pair. So, that myth is key in that it’s where he started his analytic procedure. But he certainly was not asserting that the key myth was somehow historically prior to all the others, or that he was analyzing the myths in historical sequence. The key myth was an arbitrary choice of a starting point.

Lévi-Strauss was trying to characterize the ‘logic’ underlying the construction of myths. As strange as those stories might seem to us, there was a logic to it. What is it? What is the trajectory through ‘knowledge space’ that generates myths?

Will understanding how ChatGPT generates stories give us insight into the myths Lévi-Strauss investigated. I suspect so.

No comments:

Post a Comment