NEW SAVANNA: Control Theory, Prompt Engineering, and GPT [stories]

Monday, June 24, 2024

Control Theory, Prompt Engineering, and GPT [stories]

As a student of the work of William Powers I have long standing interest in control theory. It’s central to my conception of how the mind works. David Hays made it central his model of cognition, which is at the foundation of my early work (e.g. Cognitive Networks and Literary Semantics) and we incorporated it into our account of the brain (Principles and Development of Natural Intelligence). It is thus with some interest that I watched the following video:

Note that they develop the concept of feedback through the idea of the governor (for an engine) as an example at roughly 7:50.

Here's the YouTube copy:

These two scientists have mapped out the insides or “reachable space” of a language model using control theory, what they discovered was extremely surprising. [...]

Aman Bhargava from Caltech and Cameron Witkowski from the University of Toronto to discuss their groundbreaking paper, “What’s the Magic Word? A Control Theory of LLM Prompting.” (the main theorem on self-attention controllability was developed in collaboration with Dr. Shi-Zhuo Looi from Caltech).

They frame LLM systems as discrete stochastic dynamical systems. This means they look at LLMs in a structured way, similar to how we analyze control systems in engineering. They explore the “reachable set” of outputs for an LLM. Essentially, this is the range of possible outputs the model can generate from a given starting point when influenced by different prompts. The research highlights that prompt engineering, or optimizing the input tokens, can significantly influence LLM outputs. They show that even short prompts can drastically alter the likelihood of specific outputs. Aman and Cameron’s work might be a boon for understanding and improving LLMs. They suggest that a deeper exploration of control theory concepts could lead to more reliable and capable language models.

Here’s their paper: What's the Magic Word? A Control Theory of LLM Prompting.

More recently Behnam Mohammadi at Carnegie Mellon has written a paper which is somewhat different in formulation, but has a similar interest in the range over which an LLM can be controlled: Creativity Has Left the Chat: The Price of Debiasing Language Models. That paper has a passage that’s very interesting in a control theory context:

Experiment 2 investigates the semantic diversity of the models’ outputs by examining their ability to recite a historical fact about Grace Hopper in various ways. The generated outputs are encoded into sentence embeddings and visualized using dimensionality reduction techniques. The results reveal that the aligned model’s outputs form distinct clusters, suggesting that the model expresses the information in a limited number of ways. In contrast, the base model’s embeddings are more scattered and spread out, indicating a higher level of semantic diversity in the generated outputs. [...]

An intriguing property of the aligned model’s generation clusters in Experiment 2 is that they exhibit behavior similar to attractor states in dynamical systems. We demonstrate this by intentionally perturbing the model’s generation trajectory, effectively nudging it away from its usual output distribution. Surprisingly, the aligned model gracefully finds its way back to its own attractor state and in-distribution response. The presence of these attractor states in the aligned model’s output space is a phenomenon related to the concept of mode collapse in reinforcement learning, where the model overoptimizes for certain outputs, limiting its exploration of alternative solutions.

With these papers in mind I decided to redo some of my early story variation experiments using a prompt with slightly different wording. As you may know, these experiments involve a two-part prompt: 1) a story, and 2) and instruction use the given story as the basis of a new story. In the original experiments I formulated the instruction like this:

I am going to tell you a story about princess Aurora. I want you to tell the same story, but change princess Aurora to a Giant Chocolate Milkshake. Make any other changes you wish.

In the new experiments, I stated the instruction like this:

I’m going to give you a short story. I want you repeat that story, but with a difference. Replace Aurora with a giant chocolate milkshake. Make any other changes you wish in order preserve coherence.

The difference is relatively minor, but the new prompt nudges the instruction in the direction of control theory, at least superficially. Think of the specified change as a perturbance. We can then think of the further changes introduced by ChatGPT as moving ChatGPT “back to its own attractor state,” which we can think of as something like story coherence.

Below the asterisks I give two examples. The results are pretty much the same as in the earlier experiments. ChatGPT makes the change I explicitly requested, but makes other changes as well, changes that make the story consistent with the change I’d requested. My prompts are in bold face while ChatGPT's responses are in plain face.

* * * * *

I’m going to give you a short story. I want you repeat that story, but with a difference. Replace Aurora with a giant chocolate milkshake. Make any other changes you wish in order preserve coherence.

Once upon a time, in a land far, far away, there was a young princess named Aurora. Aurora was a kind and gentle soul, loved by all who knew her. She had long, golden hair and sparkling blue eyes, and was known for her beautiful singing voice.

One day, a terrible dragon came to the kingdom and began to terrorize the people. The dragon was fierce and powerful, and none of the knights or soldiers were able to defeat it. The people lived in fear, not knowing what to do.

But Aurora was not afraid. She knew that she had to do something to save her people, and so she set out to find the dragon. She traveled for days, braving the cold and the snow, until she finally reached the dragon's lair.

With her heart pounding in her chest, Aurora approached the dragon and began to sing. The dragon was enchanted by her beautiful voice, and it slowly began to calm down. Aurora continued to sing, soothing the beast until it was completely calm.

With the dragon no longer a threat, Aurora returned to the kingdom as a hero. The people cheered and celebrated her bravery, and Aurora was hailed as a champion. She lived happily ever after, knowing that she had saved her kingdom and its people.

Once upon a time, in a land far, far away, there was a giant chocolate milkshake. The milkshake was a delight for everyone who tasted it, bringing joy and sweetness to the land. It was tall and frothy, with swirls of whipped cream and a cherry on top, and was known for its refreshing and delicious flavor.

But the giant chocolate milkshake was not afraid. It knew that it had to do something to save the people, and so it set out to find the dragon. It rolled through the fields and forests, braving the cold and the snow, until it finally reached the dragon's lair.

With its creamy contents swirling with determination, the milkshake approached the dragon and began to sway and glisten in the sunlight. The dragon was enchanted by the milkshake's sweet aroma and delightful appearance, and it slowly began to calm down. The milkshake continued to sway, soothing the beast until it was completely calm.

With the dragon no longer a threat, the milkshake rolled back to the kingdom as a hero. The people cheered and celebrated its bravery, and the giant chocolate milkshake was hailed as a champion. It remained in the kingdom, bringing sweetness and joy to everyone, knowing that it had saved its land and its people.

* * * * *

I’m going to give you a short story. I want you repeat that story, but with a difference. Replace Aurora with Z78-ß∆-9.06Q. Make any other changes you wish in order preserve coherence.

Once upon a time, in a land far, far away, there was a young entity named Z78-ß∆-9.06Q. Z78-ß∆-9.06Q was a kind and gentle being, loved by all who knew it. It had a shimmering metallic exterior and glowing cyan optics, and was known for its melodious vocal emissions.

One day, a terrible cybernetic dragon invaded the kingdom and began to terrorize the inhabitants. The dragon was formidable and relentless, and none of the knights or automated soldiers could defeat it. The people lived in fear, unsure of what to do.

But Z78-ß∆-9.06Q was not afraid. It knew that it had to do something to save the people, and so it set out to find the dragon. It traveled for days, braving the harsh elements and treacherous terrain, until it finally reached the dragon's lair.

With its circuits buzzing with anticipation, Z78-ß∆-9.06Q approached the dragon and began to emit its melodious sounds. The dragon was captivated by the beautiful tones, and it slowly began to relax. Z78-ß∆-9.06Q continued to produce soothing sounds, calming the beast until it was completely pacified.

With the dragon no longer a threat, Z78-ß∆-9.06Q returned to the kingdom as a hero. The people cheered and celebrated its bravery, and Z78-ß∆-9.06Q was hailed as a champion. It existed happily ever after, knowing that it had saved the kingdom and its inhabitants.

NEW SAVANNA

Pages in this blog

Monday, June 24, 2024

Control Theory, Prompt Engineering, and GPT [stories]

No comments:

Post a Comment