Thursday, July 28, 2022

The machine majority language problem [entropic text]

The linked article: Benjamin Bratton and Blaise Agüera y Arcas, The Model Is The Message, Noema, July 12, 2022.

The idea of synthetic language:

At what point is calling synthetic language “language” accurate, as opposed to metaphorical? Is it anthropomorphism to call what a light sensor does machine “vision,” or should the definition of vision include all photoreceptive responses, even photosynthesis? Various answers are found both in the histories of the philosophy of AI and in how real people make sense of technologies.

Synthetic language might be understood as a specific kind of synthetic media. This also includes synthetic image, video, sound and personas, as well as machine perception and robotic control. Generalist models, such as DeepMind’s Gato, can take input from one modality and apply it to another — learning the meaning of a written instruction, for example, and applying this to how a robot might act on what it sees.

This is likely similar to how humans do it, but also very different. For now, we can observe that people and machines know and use language in different ways. Children develop competency in language by learning how to use words and sentences to navigate their physical and social environment. For synthetic language, which is learned through the computational processing of massive amounts of data at once, the language model essentially is the competency, but it is uncertain what kind of comprehension is at work. AI researchers and philosophers alike express a wide range of views on this subject — there may be no real comprehension, or some, or a lot. Different conclusions may depend less on what is happening in the code than on how one comprehends “comprehension.”

Does this kind of “language” correspond to traditional definitions, from Heidegger to Chomsky? [...]

There are already many kinds of languages. There are internal languages that may be unrelated to external communication. There are bird songs, musical scores and mathematical notation, none of which have the same kinds of correspondences to real world referents. Crucially, software itself is a kind of language, though it was only referred to as such when human-friendly programming languages emerged, requiring translation into machine code through compilation or interpretation.

As Friedrich Kittler and others observed, code is a kind of language that is executable. It is a kind of language that is also a technology, and a kind of technology that is also a language. In this sense, linguistic “function” refers not only to symbol manipulation competency, but also to the real-world functions and effects of executed code. For LLMs in the world, the boundary between symbolic function competency, “comprehension,” and physical functional effects are mixed-up and connected — not equivalent but not really extricable either.

What happens in a world where the quantity of text generated by LLMs exceeds that generated by humans by a long margin?

Imagine that there is not simply one big AI in the cloud but billions of little AIs in chips spread throughout the city and the world — separate, heterogenous, but still capable of collective or federated learning. They are more like an ecology than a Skynet. What happens when the number of AI-powered things that speak human-based language outnumbers actual humans? What if that ratio is not just twice as many embedded machines communicating human language than humans, but 10:1? 100:1? 100,000:1? We call this the Machine Majority Language Problem.

On the one hand, just as the long-term population explosion of humans and the scale of our collective intelligence has led to exponential innovation, would a similar innovation scaling effect take place with AIs, and/or with AIs and humans amalgamated? Even if so, the effects might be mixed. Success might be a different kind of failure. More troublingly, as that ratio increases, it is likely that any ability of people to use such cognitive infrastructures to deliberately compose the world may be diminished as human languages evolve semi-autonomously of humans.

Nested within this is the Ouroboros Language Problem. What happens when language models are so pervasive that subsequent models are trained on language data that was largely produced by other models’ previous outputs? The snake eats its own tail, and a self-collapsing feedback effect ensues.

The resulting models may be narrow, entropic or homogeneous; biases may become progressively amplified; or the outcome may be something altogether harder to anticipate. What to do? Is it possible to simply tag synthetic outputs so that they can be excluded from future model training, or at least differentiated? Might it become necessary, conversely, to tag human-produced language as a special case, in the same spirit that cryptographic watermarking has been proposed for proving that genuine photos and videos are not deepfakes? Will it remain possible to cleanly differentiate synthetic from human-generated media at all, given their likely hybridity in the future?

There’s much more at the link.

No comments:

Post a Comment