Friday, November 24, 2023

We’ll have a reasonable model of how LLMs function before we reach AGI*

I understand that the concept of AGI is vague, but a lot of smart people have a lot invested in it and in making predictions about when we’ll construct one. That’s the reason I’m using it. And I’m saying that we’ll understand how LLMs work before we create an AGI. Moreover, since it’s not clear to me that we’ll ever create an AGI – we may just lose interest – I also believe that we will arrive at a reasonable understanding of how LLMs work.

On that score, here’s something that I’ve posted at LessWrong:

In the days of classical symbolic AI, researchers would use a programming language, often some variety of LISP, but not always, to implement a model of some set of linguistic structures and processes, such as those involved in story understanding and generation, or question answering. I see a similar division of conceptual labor in figuring out what’s going on inside LLMs. In this analogy I see mechanistic understanding as producing the equivalent of the programming languages of classical AI. These are the structures and mechanisms of the virtual machine that operates the domain model, where the domain is language in the broadest sense. I’ve been working on figuring out a domain model and I’ve had unexpected progress in the last month. I’m beginning to see how such models can be constructed. Call these domain models meta-models for LLMs.

It’s those meta models that I’m thinking are five years out. What would the scope of such a meta model be? I don’t know. But I’m not thinking in terms of one meta model that accounts for everything a given LLM can do. I’m thinking of more limited meta models. I figure that various communities will begin creating models in areas that interest them.

I figure we start with some hand-crafting to work out some standards. Then we’ll go to work on automating the process of creating the model. How will that work? I don’t know. No one’s ever done it.

I know how to get that started, but it will take others with skills I don’t have (in math and programming) to make it work.

It’s going to take me awhile to get my recent insights written up in a form I can present to others, but I’m building on work I’ve been doing on ChatGPT over the last year. These are the most important papers:

I’ll offer one last remark: These meta models will made crucial use of work done in symbolic computing back in the 1960s and 1970s.


*Perhaps AGI will seem less seductive by that time. Perhaps, just as the concept of phlogiston gave way to the concept of oxidation, the concept of AGI will give way to...something else. Something more mundane perhaps, but also more useful and real, and more interesting.

No comments:

Post a Comment