NEW SAVANNA: Constructing ‘Agent’ AIs, natural language computers

Wednesday, April 12, 2023

Constructing ‘Agent’ AIs, natural language computers

Caveat: This is mostly a note to myself, but you’re welcome to look over my shoulder.

Let’s start with a tweet stream. Here’s the first two tweets:

Ok so what exactly are agents? On a high level they are:

An AI system that, given a task, runs in a loop until the task is solved

Basically the AI gets assigned a goal, figures out what it needs to do to accomplish that goal (on its own), and then spawns more AI to do it.
— Sully (@SullyOmarr) April 11, 2023

Now we zip on over to LessWrong for a post that abstracts and generalizes over this idea, Scaffolded LLMs as natural language computers. Here’s a passage:

What we have essentially done here is reinvented the von-Neumann architecture and, what is more, we have reinvented the general purpose computer. This convergent evolution is not surprising -- the von-Neumann architecture is a very natural abstraction for designing computers. However, if what we have built is a computer, it is a very special sort of computer. Like a digital computer, it is fully general, but what it operates on is not bits, but text. We have a natural language computer which operates on units of natural language text to produce other, more processed, natural language texts. Like a digital computer, our natural language (NL) computer is theoretically fully general -- the operations of a Turing machine can be written as natural language -- and extremely useful: many systems in the real world, including humans, prefer to operate in natural language. Many tasks cannot be specified easily and precisely in computer code but can be described in a sentence or two of natural language.

Armed with this analogy, let's push it as far as we can go and see where the implications take us.

First, let's clarify the mappings between scaffolded LLM components and the hardware architecture of a digital computer. The LLM itself is clearly equivalent to the CPU. It is where the fundamental 'computation' in the system occurs. However, unlike the CPU, the units upon which it operates are tokens in the context window, not bits in registers. If the natural type signature of a CPU is bits -> bits, the natural type of the natural language processing unit (NLPU) is strings -> strings. The prompt and 'context' is directly equivalent to the RAM. This is the easily accessible memory that can be rapidly operated on by the CPU. Thirdly, there is the memory. In digital computers, there are explicit memory banks or 'disk' which have slow access memory. This is directly equivalent to the vector database memory of scaffolded LLMs. The heuristics we currently use (such as vector search over embeddings) for when to retrieve specific memory is equivalent to the memory controller firmware in digital computers which handles accesses for specific memory from the CPU. Finally, it is also necessary for the CPU to interact with the external world. In digital computers, this occurs through 'drivers' or special hardware and software modules that allow the CPU to control external hardware such as monitors, printers, mice etc. For scaffolded LLMs, we have plugins and equivalent mechanisms. Finally, there is also the 'scaffolding' code which surrounds the LLM core. This code implements protocols for chaining together individual LLM calls to implement, say, a ReAct agent loop, or a recursive book summarizer. Such protocols are the 'programs' that run on our natural language computer.

Here's a comment on programming languages:

The obvious thing to think about when programming a digital computer is the programming language. Can there be programming languages for NL computers? What would they look like? Clearly there can be. We are already beginning to build up the first primitives. Chain of thought. Selection-inference. Self-correction loops. Reflection. These sit at a higher level of abstraction than a single NLOP. We have reached the assembly languages. CoT, SI, reflection, are the mov, leq, and goto, which we know and love from assembly. Perhaps with libraries like langchains and complex prompt templates, we are beginning to build our first compilers, although they are currently extremely primitive. We haven't yet reached C. We don't even have a good sense of what it will look like. Beyond this simple level, there are so many more abstractions to explore that we haven't yet even begun to fathom. Unlocking these abstractions will require time as well as much greater NL computing power than is currently available. This is because building non-leaky abstractions comes at a fundamental cost. Functional or dynamic programming languages are always slower than bare-metal C and this is for a good reason. Abstractions have overheads, and while you are as limited by NLOPs as we currently are, we cannot usefully use or experiment with these abstractions; but we will.

The general idea is that, instead of thinking about LLMs as THE device, which is tricked out with various gadgets and appliances to make it usable, think of it as a particularly important component in a set of components for building things, analogous to Legos, or an Erector Set (from the ancient days).

In general, I like the idea of linking LLMs together. This feels like some folks are inching toward serious ideas about how a mind might work.

You might want to think in terms of Thought as Inner Speech, and Vygotsky’s account on language development. Also, think about Walter Freeman’s concept of the “cinematic” mind in the context of token generation. And then there's the idea of language as an index over mental space, which you can find in the paper David Hays and I did, Principles and Development of Natural Intelligence, and my recent, Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind.