Wednesday, May 29, 2024

How to Build & Understand GPTs

This conversation runs for over three hours. I've not yet listened to the whole thing. I'm about 2 hours and 15 minutes in, and that's taken me three or four sittings. I find it interesting. Yes, it's technical, a bit out of my range. But not so far that I can't get a feel for what's going on. The opening discussion of long contexts is interesting. I'm now in the discussion of feature spaces, which is interesting as well. Here's a transcript.

(00:00:00) - Long contexts
(00:17:04) - Intelligence is just associations
(00:33:27) - Intelligence explosion & great researchers
(01:07:44) - Superposition & secret communication
(01:23:26) - Agents & true reasoning
(01:35:32) - How Sholto & Trenton got into AI research
(02:08:08) - Are feature spaces the wrong way to think about intelligence?
(02:22:04) - Will interp actually work on superhuman models
(02:45:57) - Sholto's technical challenge for the audience
(03:04:49) - Rapid fire

Here's a comment I made:

Two things, both about superposition: first a note about the brain, and then a note about linguistics.

FWIW, a bit over two decades ago I had extensive correspondence with the late Walter Freeman at Berkeley, who was one of the pioneers in the application of complexity theory to the study of the brain. He pretty much assumed that any given neuron (w/ it's 10K connections to other neurons) would participate in many perceptual or motor schemas. The fact that now and then you'd come up with neurons who had odd-ball receptive properties (e.g. a monkey's paw, or Bill Clinton) was interesting, but hardly evidence for the existence of so-called grandmother neurons (i.e. a neuron for your grandmother and, by extension, individual neurons for individual perceptual objects). As far as I can tell, the idea of neural superposition goes back decades, at least to the late 1960s when Karl Pribram and others started thinking about the brain in holographic terms.

Setting that aside, a somewhat limited form of superposition has been common in linguistics going back to the early 20th century. It's the basic idea underling the concept of distinctive features in phonetics/phonology. Speech sound is continuous, but we hear language in terms of discrete segments, called phonemes. Phonemes are analyzed in terms of distinctive features. That is, they are analyzed in terms of the sound features that distinguish one speech sound from another in a given language. The number of distinctive features in a given language system is smaller than the number of phonemes. I don't know off hand what the range is, but the number of phonemes in a language is on the order of 10s and the number of distinctive features will be somewhat smaller for a given language. So phonemes can be identified by a superposition of distinctive features.

The numbers involved are obviously way smaller than the features and parameters in an LLM. But the principle seems to be the same.

No comments:

Post a Comment