Wednesday, January 31, 2024

A short note on LLMs and speech

The linguistic capacities of large language models (LLMs), such as ChatGPT, is remarkable. However, we should remember that it is also NOT characteristically human. Well, of course, not; it’s a computer. But that’s not what I have in mind.

What I’m thinking is that human language is, first of all, speech, and speech is interactive. Speech is interactive. LLMs are, at best, weakly interactive, though one can “converse” with them in short strings.

It is rare for a person to deliver a long string of spoken words. What do I mean by long? I don’t know. But I’m guessing that if we examined a large corpus of spoken language gathered in natural settings that we’d find relatively few utterances over 100 words long, or even 50 words long. Storytellers will deliver long stretches of uninterrupted speech, but they work at it. It’s not something that comes ‘naturally’ in the course of speaking with others. Learning to do it requires System 2 thinking, though the actual oral delivery of a story is likely to be confined to System 1.

Humans do produce long strings of words, but that’s most likely during writing. And writing is not “natural,” One must deliberately learn the writing system in a way that’s quite different from acquiring a first language, and then one must learn to produce texts that are both relatively long, over 500 or 1000 words, and coherent. Thus the fact that LLMs can produce 200, 300, 500 or more words at a stretch is quite unusual. And this is all done in some approximation to System 1 mode.

No comments:

Post a Comment