Tuesday, December 13, 2022

The current state of AI: ChatGPT and beyond [Ramble]

I’ve been spending A LOT of time with ChatGPT; my MSWord doc containing a record of that interaction is currently 116 pages and 38K words long. I don’t know how many hours I’ve logged. Alas, I’m falling behind on writing it up.

I’m also involved in an interesting conversation with Bruce Smith at Scott Aaronson’s blog. Beyond that, while I’ve been hanging out at LessWrong over a half year, I’ve mostly been observing. But ChatGPT has gotten me posting over there, comments on other’s posts, but I’ve also been cross-posting some of New Savanna ChatGPT posts and I’m planning on consolidating some of my most recent comments and expanding them in a post specifically for LessWrong. So I’m becoming more of a participant.

All of this has me thinking. My purpose here is to run up a quick sketch. I’ll elaborate later.

Latent patterns in ChatGPT

My first major bit of work with ChatGPT was guiding it though a Giradian analysis of Spielberg’s Jaws. That got me to thinking about latent patterns, patterns that are somehow there, but not explicitly. Rather, they self-assemble in response to prompts. I’m thinking that these patterns represent at least four levels of of organization in the model:

  1. word-level semantics
  2. sentence-level linkages, i.e. syntax
  3. discourse level patterns (e.g. conversational turn-taking)
  4. abstract patterns over stories

The post on Jaws represents the fourth level. My posts on Spielberg’s AI and Tezuka’s Astro Boy stories are also about level 4 abstract patterns. My interest in such patterns stems from the work I did early in my career with David Hays. That work was on classical symbolic systems (GOFAI), specifically semantic or cognitive networks. Similarly, and have two blog posts about level 3 structures. Taking this work in conjunction with various work others have done on interpreting what’s in LLMs I provisionally conclude that something like a semantic network is latent in LLMs.

But then I would conclude that, wouldn’t I? After all, ever since that early work with Hays I’ve thought that the trick to understanding how a mind is implemented in a brain hinges on figuring out how a large neural network gives rise to a cognitive net. I discuss that in some detail in a recent working paper, Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind, Version 2. My working paper from 2020, GPT-3: Waterloo or Rubicon? Here be Dragons, contains useful remarks on the relationship between classical cognitive nets and LLMs.

Three major technical issues:

Scaling – Can we go “all the way” simply by scaling up artificial neural networks of various kinds. I don’t think so. Neither, for example, do Gary Marcus and Yann LeCun, who are otherwise very different thinkers (and often at loggerheads on Twitter).

Symbols – Do we need to deal with symbols? Gary Marcus says we do. Yann Lecun is listening, but skeptical. I think symbols are necessary. The issue is how to integrate them with neural nets. You can’t just bolt symbolic processing onto neural nets. The symbolic processing must arise from within the network. My working paper, Relational Nets over Attractors, addresses that issue in the context of the brain. I note that my arguments (above) on discourse-level and abstract patterns suggest that ChatGPT is doing (an impressive imitation of) some kinds of symbolic processing.

Just how to implement something that in artificial system, that I do not know. For all I know it may require new hardware that is more analog in nature. This post on Saty Chary’s structured physical system hypothesis (SPSH) isn’t quite there, but it points in that direction.

Embodiment, linkage to the physical world – Large language models are “trained” in a pure textual environment, with no access to the physical world. Artificial visual systems are trained in photographs and video tapes, but don’t have direct access to the physical world. I believe that the problem of common-sense reasoning is entwined with this issue – see GPT-3: Waterloo or Rubicon? for some discussion of this.

Robots and self-driving cars, on the other hand, have to function in the physical world and so their systems must deal, not only with perceiving it, but with moving about in it. I am currently of the opinion that there is no substitute for direct physical interaction. Eric Jang, VP for AI at Halodi, imagines a world in which data from robots is collected and synthesized as they roam the world, going about their appointed rounds. Think about what that implies for the organization of the development, deployment, and maintenance of this technology.

What kind of beasts are these?

Let’s think more generally about these artificial neural nets, such as ChatGPT. What kind of thing/being/creature are they? But at their core, it’s operating on principles and mechanisms unlike any other kind of program, that is, the kind of program where a programmer or team of programmers generates every line “by hand,” if only through calls to libraries, where each line and block is there to serve a purpose known to the programmer/s.

That’s not how ChatGPT was created. Its core consists of a large language model that was compiled by an engine, a Generative Pre-trained Transformer (GPT), that “takes in” (that is, is “trained on”) a huge corpus of texts. The resulting model is opaque to us, we didn’t program it. The resulting behavioral capacities are unlike those of any other creature/being/thing we’ve experienced, nor do we know what those capacities will evolve into in the future. This creature/being/thing is something fundamentally NEW in the universe, at least our local corner of it, and needs to be thought of appropriately. It deserves/requires a new term.

Consider an analogy I’ve just thought up, thus I don’t know where it leads, think about a sheep herder and their dog. They direct the dog to round up the sheep. The dog does so. Who/what rounded up the sheep? The herder or the dog? The herder couldn’t have accomplished that task without the dog. Of course, they (or someone else) trained the dog (which is of a species that has been bred for this purpose). The training, and, for that matter, the breeding as well, would have been useless without the capacities inherent in the dog. So we should assign some ontological credit to the herder for that. But some credit surely belongs to the dog (to Nature if you will) as well.

Those dogs have evolved from animals that were created by Nature. Think of the opaque engines at the heart of these new AI systems as a different kind of wildness. An artificial wildness, if you, but still wild and thus foreign and opaque to us.

Scott Alexander has in interesting article on the chaos ChatGPT has been inadvertently generating because it wasn’t intended to say those things. He is, in effect, treating ChatGPT as an overflowing well of wildness. That's what all these deep learning engines are producing an ever-expanding sea of artificial wilderness. I don’t think we know how to deal with that. How do we domesticate it?

Our brave new world

I’m imagining a future in which communities of humans continue interaction with one another, as we do now. But also interact with computers, as we do now, but also in different ways, such as is required by this new technology. And those new creatures, they form their own community among themselves, interacting with one another as they interact with us. They’re figuring us out, we’re figuring them out. Think of the dogs.

But also think of Neil Stephenson’s Diamond Age: Or, A Young Lady’s Illustrated Primer. Think of a world in which everyone has such a primer, a personal robot companion. That’s my best guess about where we’re headed.

More later.

No comments:

Post a Comment