Wednesday, May 12, 2021

The Word Illusion [in NLP]

This is something I’ve been thinking about for awhile. What you’re looking at as you’re reading this is a string of words, right? Well, yes, but if you want to get picky, no.

I want to get picky.

Words and word forms

Words as we ordinarily understand the term have meaning and syntactic affordances (that is, they are “parts of speech”), and take auditory, graphic, and gestural form. All you see on the screen are the graphic forms of words, the auditory and gestural forms aren’t there, and the meanings and syntactic affordances exist in your mind. The graphic forms elicit those meanings and syntactic affordances from you and so you understand what I’m putting before you. The meanings you infer may or may not be a good match for the one’s I have/had in mind.

A word form is no more the word in full than a photograph of, say, a flower is the flower in full. What interests me is what happens when intuitions based on practical experience with language, which implies words-in-full, are lurking behind the scenes in intellectual work based on word forms as evidence.

The word illusion in natural language processing

Various investigators (in AI, NLP, digital humanities, social science) have been and are doing remarkable things with computing over collections, often extremely large collections, of words. But they aren’t working with words in the full sense, but only digital encodings of word forms. These computational methods allow the investigators to infer interesting things about the meanings inherent in those collections. Where did those meanings come from? They certainly aren’t IN THERE in the collections of word forms. Rather, the investigators infer them to be there because, well, they know that ultimately each of those word forms is attached to or associated with one or more meanings. They are under the spell of the word illusion.

In pointing out that those are only word forms I’m not telling anyone anything they don’t already know, but they don’t think about it enough. In their eagerness to see meaning they don’t think nearly enough about how computation over the structure of those texts produces such interesting results.

And I can see from how this is going, that teasing this one out is going to be tough. Here’s a start, but only that:

Until we think explicitly about the word forms (only) [that is, telling ourselves over and over that we're only computing over (empty) word forms, there's nothing else there] we're not going to know what we're doing. In the case of AI engines like GPT-3 we think that it actually understands language to some extent while at the same time being puzzled at how it creates such convincing output. We are right to be amazed and puzzled by that output. But we are mistaken to think it understands anything as it lacks the basis for understanding anything. We need to get beyond our amazement an initiated serious inquiry into just what GPT-3 is doing, see my working paper on GPT-3 for some clues. At the same time we should remind ourselves that most of the prose we do is routine boilerplate.

My working paper, Toward a Theory of the Corpus, has some suggestions about what computational critics should do once they get over the world illusion. Can the patterns they find in those texts then be interpreted as telling us something about the mind? Why not? But only once you understand that, despite the limitations of working with only word forms, those patterns were still produced by the human mind and so must tell us something about it.

[Reworked from “Ramble into 2021,” April 7, 2021,]

No comments:

Post a Comment