Wednesday, February 5, 2025

From LLM mechanisms to ring-composition: A conversation with Claude 3.5

I've published a new working paper. Title above, links, abstract, TOC, and Prefatory remarks below:

Abstract: This is a wide-ranging discussion about narrative structure and how it might be encoded in large language models. 1. Word Embeddings and Training: We clarified how tokens (subword units) are mapped to high-dimensional vectors, discussed how these vectors are updated during training, and explored how the token vocabulary (around 50K) can cover a much larger word vocabulary through subword tokenization. 2. Multi-scale Organization in Narrative: We examined different types of transitions (word-to-word, sentence-to-sentence, paragraph-to-paragraph), discussed how probability distributions for next-token prediction vary depending on the type of transition and noted how constraints are tighter at the word level but looser at higher narrative level. 3. Empirical Evidence from "Heart of Darkness": We analyzed two charts: frequency of "Kurtz" mentions and paragraph lengths, found periodic patterns in Kurtz mentions that persisted even as frequency increased, identified a significant peak in paragraph length coinciding with Kurtz's introduction, and discovered that women's roles bookended the narrative to form a ring composition. 4. Implications for LLMs: We examined how different scales of narrative structure might be encoded in transformer architectures, explored how attention mechanisms might capture these patterns implicitly, and considered the limitations of context window size for capturing full novel-length patterns. 5. Future Research Possibilities: What's the potential for analyzing other texts for similar patterns. What are the challenges of computationally detecting ring composition. Finally, we explored the possibility of using LLMs to verify structural patterns across large corpora.

The upshot: narrative structure operates at multiple scales simultaneously, and these patterns might be encoded in transformer architectures in ways analogous to holographic encoding, with different frequencies capturing different levels of structure. 

Contents

Claude 3.5 summarizes the discussion 2
Prefatory remarks: Why publish first tentative thoughts? 3
Part 1: Preliminaries, tokens and vectors
Part 2: From positional encoding to stories 9
Part 3: Narrative structure on multiple levels, Heart of Darkness 15

Prefatory remarks: Why publish first tentative thoughts?

This set of notes started with me asking Claude 3.5 to clarify a few things about the mechanisms of LLMs (Part 1). As things got clearer, I started bringing in my research, first some work I did with stories and ChatGPT two years ago. That’s about how plot structures might be encoded in LLMs (Part 2). Then I introduce some empirical work I did on Heart of Darkness over a decade ago (Part 3). That work is directly related to the ChatGPT story work. This discussion took place in several sessions over Feb. 2-4.

This is tentative work. Not quite first thoughts, but early thoughts. I am making it publicly available for two reasons:

  1. It is an example of how a sophisticated chatbot, such as Claude 3.5, can be used in fleshing out ideas.
  2. The ideas developed in this conversation, while preliminary, might be of use to humanists, literary critics in particular, who are thinking about how these tools can be used in research.

On the first point, note that I turned to Claude on these issues because, after reading many explainers-for-the-unwashed and looking around in the technical literature, certain matters were still unclear. Perhaps others can benefit from this discussion. This happens a lot; it will be useful having Claude around to help me figure out what’s going on.

On the second point, these discussions bear directly on questions being explored by digital humanists: How can we use these technologies in our work? The internal mechanisms of LLMs are a bit mysterious. The dialog I had with Claude led to some specific ideas about how narrative structures could encoded in LLMs in a way that’s independent of the specific actors and settings in which the story is realized. This has implications for the study or oral narrative and traditional story grammars, though I don’t go into these in the discussion.

As I have no idea when I will get around to a more formal presentation of this work – this research is very much in medias res – I thought I would get these ideas out now. Others might find them useful.

No comments:

Post a Comment