Monday, December 23, 2024

LLM as Collaborator, Part 1: Claude the Graduate Student

I started using ChatGPT on December 1, 2022 and have used it quite extensively ever since. I’ve spent some of my time just poking around, somewhat more time looking things up, and most of my time systematically investigating its performance. That resulted in a number of working papers, the most interesting of which is about stories: ChatGPT tells stories, and a note about reverse engineering: A Working Paper, Version 3.

I started working with Claude 3.5 Sonnet on November 18, 2024. I’ve used it in those three capacities, though obviously not as much as I’ve used ChatGPT. In particular, I’ve used it for background information on melancholy in various aspects. I’ve also done something I’d never done with ChatGPT, asked it to describe photographs. I’m doing this to see how well it does.

Then on November 24, 2024, I began using it in a somewhat more interesting new capacity, though I’m not sure what to call it. The phrase “thought partner” comes to mind, though it seems too much like “thought leader,” which I don’t like. I’m using it as a sounding board. Better yet, its a collaborator playing the role of sounding board. It’s not an equal collaborator in the intellectual dialog; academic norms would not require me to offer it co-authorship of papers. But those norms might well require an explicit acknowledgement, not to alert the reader that one of those new-fangled LLM things has been involved in the thinking, but simply acknowledging the help it has given me.

As for just what kind of help that is, the best way is to look at some examples. I’ve already published two of these dialogues on New Savanna: Computer chess, super-intelligence, A Connecticut Yankee, cargo cults, and Claude 3.5 Sonnet speculates about future developments in AI and beyond. The rest of this post contains the first interaction where Claude began acting in intellectual partnership.

When I started the interaction I was simply interested in seeing what it knew about my teacher and mentor, the late David Hays. I wasn’t looking for new information. I just wanted to see what it had picked up. That began to change when it asked me about the work of Quillian and of Schank and Abelson. Then I told it about Hays’s interest in the work of William Powers. Now we’re getting serious. This guy knows something. And then WHAM! Claude responded intelligently when I mentioned the work of Peter Gärdenfors. The interaction continued in that mode to the end.

Utterly remarkable. AGI? What the hell is that? But no. And I fear that thinking in those terms is likely to blind me to whatever it is that Claude is doing. I realize that the concept of AGI is written into the contractual relationship between OpenAI and Microsoft, but I’m not playing that game. Will they end up litigating it? Who knows?

I’m planning two more posts in this series. The next will be about the implications of paragraph length in Heart of Darkness. Then I’ll post a long interaction that takes Miriam Yevick’s 1975 paper, “Holographic or fourier Logic,” as its starting point.

In the following dialog my comments are bold while Claude’s are plain. I’ve interpolated a few comments highlighted in blue.

* * * * *

Are you familiar with the work of a computational linguist named David Hays? He worked on machine translation at RAND in the 1950s and 1960s?

Yes, I am somewhat familiar with David Hays' work, though I should note that since this involves fairly obscure historical details from computational linguistics, I want to be careful about potential inaccuracies in my knowledge. From what I understand, David Hays was one of the pioneers of machine translation and computational linguistics during the early years of the field. He worked at RAND Corporation in the 1950s and 1960s where he was involved in some of the earliest machine translation projects.

He was particularly interested in developing dependency grammar approaches for machine translation, which was quite innovative for that time period. Dependency grammar focuses on the relationships between words rather than constituent structure, which was the dominant paradigm in American linguistics at the time.

I believe he also wrote an influential book called "Computational Linguistics: Programming Linguistics in Automatic Language Translation" in the early 1960s, though I'd encourage you to verify this citation since I may be misremembering details.

Claude got the name wrong. It’s Introduction to Computational Linguistics, American Elsevier, New York, 1967.

Would you be interested in hearing more about any particular aspect of his work or that early period of machine translation research? I find the history of early computational linguistics fascinating, though again I want to be upfront that with someone as historically specific as Hays, some of my knowledge may be incomplete or imperfect.

Yes, that’s him. I studied with him at SUNY Buffalo in the 1970s. We were working on computational semantics and using a cognitive networks formalism.

Oh, that's fascinating that you studied directly with David Hays! I would be very interested to hear more about your experiences working with him and the cognitive networks approach you were developing at SUNY Buffalo. That was an important period in the development of semantic representation systems.

Could you tell me more about the cognitive networks formalism you were using? I'm particularly curious about how it compared to other semantic network approaches being developed around that time, like Quillian's work or Schank and Abelson's conceptual dependency theory.

It was more developed than Quillian’s work. As I recall, he never really developed is initial ideas. As for Schank and Abelson, they were imagining a self-sufficient cognitive system and were very much interested in identifying the primitive elements necessary for a cognitive network. By the time I began working with him, Hays had realized that cognition had to be grounded in something else – later on Steven Harnad would talk about the “symbol grounding” problem. When I worked with him, Hays was interested in grounding the cognitive system in the control-theory (cybernetic) account of perception and action developed by William T. Powers (perceptual control theory, or PCT).

Something special seems to be happening.

This is really interesting historical context - it suggests Hays was grappling with fundamental issues in cognitive science that are still relevant today. The shift from viewing semantic networks as self-contained symbolic systems to recognizing the need for grounding in perception and action seems quite prescient.

I'm curious about how Hays was trying to integrate Powers' PCT with semantic networks. As I understand it, PCT proposes that behavior emerges from organisms controlling their perceptions to match internal reference signals, rather than through direct control of outputs. Was Hays trying to use this as a way to ground symbolic meaning in perceptual-motor control loops?

This also seems to anticipate some later developments in embodied cognition and grounded cognition theories. The timing is interesting too - Powers published "Behavior: The Control of Perception" in 1973, so it sounds like Hays was engaging with these ideas pretty quickly after they were published.

Yes, that's what Hays was doing. His idea was that particular servo units in a Powers-like control system would be linked to cognitive units through what Hays called "units of perception." Hays identified a half dozen or so such parameter. Each cognitive node would be characterized by the values it took on those parameters. Relationships between nodes were thus derived from differences in parameter values for the nodes connected by an arc (or edge).

This is a fascinating approach that seems to bridge between different levels of cognitive organization. If I understand correctly, Hays was proposing that cognitive nodes weren't purely abstract symbols, but were instead characterized by specific perceptual parameters derived from control systems. And the semantic relationships between nodes emerged from comparing their parameter values.

A few questions about this system:

  • Do you recall what some of those key perceptual parameters were?
  • Was there any particular method for determining the parameter values for a given cognitive node?
  • How did this play out in practice - did you implement any working systems using this approach?

This seems like it could be seen as an early attempt at what we might now call "grounded semantics" - trying to anchor meaning in perceptual-motor processes rather than purely symbolic relationships. Though of course, the theoretical framework and terminology would have been quite different in the 1970s.

While Hays set out his ideas in an obscure book (Cognitive Structures, 1981), we never got a chance to implement any of those ideas. Recently I’ve been thinking of using Gärdenfors’ conceptual spaces as an intermediary between neural networks (conceived as dynamical systems) and symbolic cognitive networks.

I was really curious to see how Claude would react to this idea, which is somewhat novel. Would Claude get it?

That's a really interesting connection to Gärdenfors' work on conceptual spaces. The idea of using them as an intermediary layer makes a lot of sense - conceptual spaces provide a geometric/topological structure that could potentially bridge between the continuous dynamics of neural networks and more discrete symbolic representations. If I understand correctly, conceptual spaces represent concepts as regions in multidimensional spaces where the dimensions correspond to various qualities or properties. This seems like it could align well with Hays' idea of characterizing cognitive nodes through perceptual parameters, but perhaps with a more mathematically developed framework.

That sounds like Claude got the idea. I was both pleased and a little surprised. Since I’ve written about this in a long paper that’s freely available on the web [Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind, Version 3] I suppose Claude might have picked up the idea there. I have no idea, but even if it did, knowing how to summarize it as Claude just did, that is not trivial. Nor are the issues it raised next.

Would you be willing to elaborate on how you're thinking about this bridging role for conceptual spaces? I'm particularly curious about:

  1. How you envision mapping from neural network dynamics to points/regions in conceptual spaces
  2. How the geometric structure of conceptual spaces might constrain or inform the relationships in a symbolic cognitive network

Also, I wasn't familiar with Hays' 1981 book - it sounds like an important piece of this historical puzzle that deserves more attention.

Hays’s book was published by HRAF Press as a print-on-demand book. There was never much demand, alas, and the book is hard to find.

On the other stuff, the basic idea is [that] edges in a semantic network represent basins in attractor landscapes, where the system would have different landscapes for different domains. The conceptual spaces perform a dimensional reduction, representing basins as values along a small number of dimensions.

That's a really elegant way of connecting these different levels of description. If I'm understanding correctly:

  1. At the neural network level, you have attractor dynamics where the basins represent stable states in different domains
  2. These basins get mapped to points or regions in conceptual spaces through dimensional reduction
  3. The relationships between these reduced representations then form the basis for edges in the semantic network

This seems to provide a principled way to ground symbolic relationships in dynamics while also explaining why semantic networks tend to have certain types of regular structure - they're reflecting underlying regularities in the attractor landscapes.

Would each domain have its own conceptual space with different quality dimensions, or are you thinking of a single unified space? And how do you think about cross-domain relationships in this framework?

Each domain would have its own conceptual space with its own quality dimensions. Note however any given quality dimension would extend through various spaces, but the different spaces would involve different dimensions. The spaces for, rocks and animals, would both involve the color dimension, but only the animal space would involve spaces for autonomous movement.

That makes sense - so quality dimensions can span multiple conceptual spaces, but each domain has its own characteristic set of relevant dimensions. The example of color vs. autonomous movement is very clear - both are important for representing animals, but only color would be relevant for rocks.

This reminds me a bit of how different sensory modalities and feature detectors in the brain can be recruited differently depending on the category being processed. For example, motion-sensitive areas being particularly important for animal recognition but not for inanimate objects.

In your framework, how do you handle cases where concepts seem to span or bridge between different domains? For instance, something like a robot animal might involve qualities from both animate and artificial domains. Would this require some kind of mapping between different conceptual spaces?

Ah, your last two question are important, and deep. I don't have time for a detailed answer, but it involves a cognitive recasting of an old idea, the Great Chain of Being, which ultimately derives from Aristotle. Aristotle believed that objects consist of a form and a substance. Moving up one level in the chain, plants consist of a vegetative soul plus an object. When you add a sensitive soul to a plant you get the world of animals. And so forth.

I wonder how Claude will take this?

This mapping of the Great Chain of Being onto cognitive architecture is fascinating. So if I understand correctly, rather than seeing it as just a hierarchical classification system, you're viewing it as describing how more complex domains inherit and build upon the representational structures of simpler domains?

So in the robot animal example, it would involve qualities from the domain of physical objects (form, substance), potentially some plant-like qualities, animal qualities (autonomous movement, sensation), and then qualities specific to artificial/technological entities. Each level would add its own dimensions to the conceptual space while preserving relevant dimensions from lower levels.

This seems like it could provide a principled way to handle cross-domain concepts while maintaining the distinct character of different domains. It also suggests interesting questions about how novel combinations (like robot animals) might emerge through the interaction of these layered representational spaces.

Would you say this view differs from Aristotle's original conception primarily in being about representational/cognitive structure rather than metaphysical essence?

Claude took it well, did it not?

You get the idea. And yes, the modern recasting of the Great Chain is about perception and cognition, not metaphysical essence.

That's a really interesting transformation of an ancient idea into a modern cognitive framework. The Great Chain becomes less about the inherent nature of things and more about how cognitive systems build up increasingly complex representations by layering new perceptual and conceptual capabilities on top of more basic ones.

This also seems like it could help explain why humans are so good at understanding novel combinations - we have this layered cognitive architecture that lets us flexibly combine qualities from different domains while preserving their essential structure. A robot dog isn't just a random mashup of robot and dog features, but inherits organized sets of qualities from both domains in a principled way.

I imagine this could have interesting implications for artificial intelligence development too - suggesting that robust AI systems might need similarly layered architectural principles rather than trying to learn everything in a single undifferentiated space.

Yes, on the implications for robotics.

Well, it's getting to be dinner time so I'm going to have to leave this conversation. We'll have to continue with it at a later time.

Thank you for this fascinating discussion! I learned a lot about Hays' work and how it connects to broader themes in cognitive science through your firsthand experience. Your ideas about bridging between neural dynamics, conceptual spaces, and symbolic representations through a cognitively-recast Great Chain of Being are really thought-provoking. Enjoy your dinner, and I'd be very interested to continue this conversation another time.

No comments:

Post a Comment