Friday, January 26, 2024

Invariance and compression in LLMs

One way to thinking about what transformers do is compression. OK. The transformer performs a simple operation on a corpus of texts in such a way that some property of the corpus is preserved in the model. What’s kept invariant between the training corpus and the compressed model?

I think if must be the relationships between concepts. Note that in specifying relationships I mean explicitly to differentiate that from meaning. The process of thinking about LLMs has brought me to think of meaning in the following way:

  1. Meaning has two major components, intention and semanticity.
  2. Semanticity has two components, relationality and adhesion.*

Intention resides in the relationship between the speaker and the listener and is not always derivable directly from the semantics (semanticity) of the utterance. Intention in this sense is outside the scope of LLMs. And, of course, there are those who believe that without intention there is no meaning. That’s a respectable philosophical position, but it leaves you helpless to understand what LLMs are doing.

By adhesion I mean whatever it is that links a concept to the world. There are lots of concepts which are defined more or less directly in terms of physical things. That’s not going to be captured in LLMs. Of course we now have LLMs linked to vision models so the adhesion aspect of semantics is being picked up. In the universe of concrete concepts we still have relationships between those concepts, and those relationships between concepts can be captured in language without directly involving the adhesions of those concepts. That apples and oranges are both fruits is a matter of relationships between those three concepts and doesn’t require access to the adhesions of apples and oranges. And so forth and so on for a large number of concepts. Then we have abstract concepts, which can be defined entirely through patterns of other concepts, which may be concrete, abstract, or both.

So, relationality. The mechanisms of syntax are designed to map multi-dimensional relationality onto a one-dimensional string. But syntax only governs relationships between items within a sentence. But that’s not quite adequate, because sentences can consist of more than one clause. The relationship between independent clauses within a sentence is different than that between a dependent clause and the clause on which it depends. Etc. It’s complicated. And then we have the relationship between paragraphs, and so forth.

What I’m attempting to do is figure out a way of thinking about the dimensionality of the semantic system. More or less on general principle, one would like to know how to estimate that. Now, when I talk about the semantic system, I mean the semanticity of words. But transformers must deal with texts, and texts consist of sentences and paragraphs and so forth. Setting metaphorical structures aside, the meaning of a sentence is a composition over the meanings of the words in the sentence. But, as I understand it, a transformer is perfectly capable of relating the meaning of a sentence to a single point in its space. And it can do that with larger strings as well. And, of course, the ordinary mechanisms of language allow us to use a string to define a single word; that’s how abstract definition works.

And that’s as far as I’m going to attempt to take this train of thought. Still, I do think we need to recognize a distinction between what’s happening within sentences (the domain of syntax), and what happens with collections of sentences. Beyond that, it seems to me that where we want to end up eventually is a way of thinking about the relationship between the dimensionality of our semantic space and the size of the corpus needed to resolve the invariant relations in that space.

More later.

*Note: The current literature recognizes a distinction between inferential and referential processing, due, I believe, to Diego Marconi, The neural substrates of inferential and referential semantic processing (2011). The functional significance is similar, but only similar, to my distinction between referentiality and adhesion. Inferential processing depends on the relational structure of texts. Adhesion is about the physical properties of the world, affordances in J.J. Gibson’s terminology that are used to establish referential meaning for concrete concepts. But it is also about the patterns of relationships though which the meaning of abstract concepts is established.

No comments:

Post a Comment