Scott Alexander has started a discussion of recent paper mechanical interpretability paper over at Astral Codex Ten: Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. In a response to a comment by Hollis Robbins I offered these remarks:
Though it is true, Hollis, that the more sophisticated neuroscientists have long ago given up any idea of a one-to-one relationship between neurons and percepts and concepts (the so-called "grandmother cell") I think that Scott is right that "polysemanticity at the level of words and polysemanticity at the level of neurons are two totally different concepts/ideas." I think the idea of distinctive features in phonology is a much better idea.
Thus, for example, English has 24 consonant phonemes and between 14 and 25 vowel phonemes depending on the variety of English (American, Received Pronunciation, and Australian), for a total between 38 and 49 phonemes. But there are only 14 distinctive features in the account given by Roman Jakobson and Morris Halle in 1971. So, how is it the we can account for 38-49 phonemes with only 14 features?
Each phoneme is characterized by more than one feature. As you know, each phoneme is characterized by the presence (+) of absence (-) of a feature. The relationship between phonemes and features can thus be represented by matrix having 38-49 columns, one for each phoneme, and 14 rows, one for each row. Each cell is then marked +/- depending on whether or not the feature is present for that phoneme. Lévi-Strauss adopted a similar system in his treatment of myths in his 1955 paper, "The Structural Study of Myth." I used such a system in one of my first publications, "Sir Gawain and the Green Knight and the Semiotics of Ontology," where I was analyzing the exchanges in the third section of the poem.
Now, in the paper under consideration, we're dealing with many more features, but I suspect the principle is the same. Thus, from the paper: "Just 512 neurons can represent tens of thousands of features." The set of neurons representing a feature will be unique, but it will also be the case that features share neurons. Features are represented by populations, not individual neurons, and individual neurons can participate in many different populations. In the case of animal brains, Karl Pribram argued that over 50 years ago and he wasn't the first.
Pribram argued that perception and memory were holographic in nature. The idea was given considerable discussion back in the 1970s and into the 1980s. In 1982 John Hopfield published a very influential paper on a similar theme, "Neural networks and physical systems with emergent collective computational abilities." I'm all but convinced that LLMs are organized along these lines and have been saying so in recent posts and papers.
* * * * *
Addendum: Another possible example: I use 655+ tags here at New Savanna for over 9600 posts. Some posts have only one or two tags, and some have a half dozen or more. Note, however, that I don't intend that an post's tag set function as an identifier for the post, or a proxy for an identifier. But the tags do characterize the posts in some way.
No comments:
Post a Comment