Pages in this blog

3. Interlude: GPT-3 as a model of the mind

Here are the key paragraphs from the previous post:
Let us notice, first of all, that language exists as strings of signifiers in the external world. In the case that interests us, those are strings of written characters that have been encoded into computer-readable form. Let us assume that the signifieds – which bear a major portion of meaning, no? – exist in some high dimensional network in mental space. This is, of course, an abstract space rather than the physical space of neurons, which is necessarily three dimensional. However many dimensions this mental space has, each signified exists at some point in that space and, as such, we can specify that point by a vector containing its value along each dimension.

What happens when one writes? Well, one produces a string of signifiers. The distance between signifiers on this string, and their ordering relative to one another, are a function of the relative distances and orientations of their associated signifieds in mental space. That’s where to look for Neubig’s isometric transform into meaning space. What GPT-3, and other NLP engines, does is to examine the distances and ordering of signifiers in the string and compute over them so as to reverse engineer the distances and orientations of the associated signifieds in high-dimensional mental space.
The purpose of this post is simply to underline the seriousness of my assertion to treat the mind as a high-dimensional space and that, therefore, we should treat the high-dimensional parameter space of GPT-3 as a model of the mind. If you aren't comfortable with the idea, well, it takes a bit of time for it to settle down (I've been here before). This post is a way of occupying some of that time.

If it’s not a model of the mind, then what IS it a model of? “The language”, you say? Where does the language come from, where does it reside? That’s right, the mind.

It is certainly not a complete model of the mind. The mind, for example, is quite fluid and iscapable of autonomous action. GPT-3 seems static and is only reactive. It cannot initiate action. Nonetheless, it is still a rich model.

I built plastic models as a kid, models of rockets, of people, and of sailing ships. None of those models completely captured the things they modeled. I was quite clear on that. I have a cousin who builds museum-class ship models from wood of various kinds, metal, cloth, paper, thread and twine (and perhaps some plastic here and there). They are much more accurate and aesthetically pleasing than the models I assembled from plastic kits as a kid. But they are still only models.

So it is with GPT-3. It is a model of the mind. We need to get used to thinking of it in those terms, dangerous as they may be. But, really, can the field get more narcissistic and hubristic than it already is?

* * * * *

This is not the first time I’ve been through this drill. I’ve been thinking about this that and the other in so-called digital humanities since 2014; call it computational criticism. These particular investigations have been using various kinds of distributional semantics – topic modeling, vector space semantics – to examine literary texts and populations of texts. They don’t think about their language models as models of the mind; they’re just, well, you know, language models, models of texts. There’s some kind membrane, some kind of barrier, that keeps us – them, me, you – from moving from these statistical models of the mind. They’re not the real thing, they’re stop gaps, approximation. Yes. And they are also models, as much models of the mind as a plastic battleship is a model of the real thing.

Why am I saying this? Like I said, to underline the seriousness of my assertion to treat the mind as a high-dimensional space. In a common formulation, the mind is what the brain does. The brain is a three-dimensional physical object.

It consists of roughly 86 billion neurons, each of which has roughly 10,000 connections with other neurons. The action at each of those synaptic junctures is mediated by upward of a 100 neurochemicals. The number of states a system can take depends on 1) the number of elements it has, 2) the number of states each element can take, and 3) the dependencies among those elements. How many states can that system assume? We don't really know. Jillions, maybe zillions, maybe jillions of zillions. A lot.

That is a state space of very high dimensionality. That state space is the mind. GPT-3 is a model of that.

* * * * *

I’ve written quite a bit about computational criticism, though nothing for formal academic publication. Here’s one paper to look at:

William Benzon, Virtual Reading: The Prosper Project Redux, Working Paper, Version 2, October 2018, 37 pp., https://www.academia.edu/34551243/Virtual_Reading_The_Prospero_Project_Redux.
Abstract: Virtual reading is proposed as a computational strategy for investigating the structure of literary texts. A computer ‘reads’ a text by moving a window N-words wide through the text from beginning to end and follows the trajectory that window traces through a high-dimensional semantic space computed for the language used in the text. That space is created by using contemporary corpus-based machine learning techniques. Virtual reading is compared and contrasted with a 40 year old proposal grounded in the symbolic computation systems of the mid-1970s. High-dimensional mathematical spaces are contrasted with the standard spatial imagery employed in literary criticism (inside and outside the text, etc.). The “manual” descriptive skills of experienced literary critics, however, are essential to virtual reading, both for purposes of calibration and adjustment of the model, and for motivating low-dimensional projection of results. Examples considered: Augustine’s Confessions, Heart of Darkness, Much Ado About Nothing, Othello, The Winter’s Tale.
* * * * *

Posts in this series are gathered under this link: Rubicon-Waterloo.

On the nature of academic literary criticism as an intellectual discipline: text, form, and meaning [where we are now]

Since the late 1950s and early 1960s the discipline has made interpretation its central focus. The discipline’s job is to determine or at least comment on and explicate the meaning of texts. At the same time the discipline has failed to define just what the text is and has arrived at no consensus about form, though formalism is a central conception. This makes sense. Form is a property of texts. If the discipline doesn’t know what a text is, then it can’t know what form is either.

Correlatively, the distinction between reading, in the ordinary sense of the word, and reading, as explicit interpretation, has become elided so that reading can mean either or both depending on the context. This too makes sense, though we need to think about it and come at it from another angle.

That other angle starts with a straight-forward definition of the text as a string of symbols. Form then becomes a matter of how words and syllables are arranged on the string. That’s my starting point; that’s the starting point for the naturalist study of literary morphology [1]. With that as your starting point, how do you get to meaning?

There are two ways.

The way of the critic

Here’s the standard way, the way of the critic: You can simply read the text, in the ordinary sense of read. This makes you a reader, obviously. To act as a literary critic, using this as your starting point, you go on to interpret the text. That is, you read it, in the sense the word has come to have among critics.

This process renders the text, as a string of symbols, all but invisible, and so obscures form – the arrangement of symbols in the string – as well. There are, of course, exceptions. In poetry rhyme and meter are obvious and so attract comment. But they are obvious that makes it easy to subordinate that comment to interpretive exegeses. In prose we have the distinction by story and plot, which is so prominent in Tristram Shandy. This too is so obvious that it doesn’t threaten the hegemony of interpretive meaning.

The way of the modeler [speculative engineer]

That’s one way. There is another. That’s what I pursued, staring with Lévi-Strauss on myth and, to a lesser extent, Jakobson on poetics. That led me first to “Kubla Khan”, where I ran up against a very elaborate formal structure [2], form in the sense of the arrangement of words and syllables on the string. Despite all the attention that had been given to this poem that formal structure had been missed.

That formal structure seemed computational to me and so I went off to graduate school in search of the computational underpinnings of literary texts. I produced a partial computational model of a single text, Shakespeare’s Sonnet 129 [3]. Take that model, flesh it out in various ways, and then embody in an appropriate computer program and then, yes, the computational engine reads the text. You could then, at least in principle, open the engine and see what it did in the course of reading the text. In a paper I co-authored with David Hays in 1976 we imagined such an engine and I fully anticipated that I would one day be working with it, as would others [4]. That didn’t happen, nor do I expect it to any time in the near future, if ever.

This path leads you to models, but also to the description of form [5]. But you do not end up with interpretations. Those you must forswear.

Which way?

There is a trade off. If you seek meaning, then you take the way of the critic, but subordinate the text and form. If you seek form, then you must subordinate meaning and pursue the say of the modeler. There is no reason, of course, why one individual can’t pursue both paths, sometimes in different investigations, sometimes at different points in the same investigation. I have investigated this trade-off in an open letter I wrote to Daniel Everett, who was an academic dean at the time [6].

I see no reason why academic literary criticism should give up the way of the critic. But it will whither and die if it does not attend more deeply to the way of the modeler.

Note on terms and citations

This short post makes many assertions about literary criticism. I believe that I have discussed all of them in various posts and working papers. But I have decided not to cite those many pieces in this post, which would have turned it into a bit of a slog to write. It’s Friday morning, I’m exhausted from thinking and writing about GPT-3. Enough. If you have a mind to do so you can hunt those pieces down through the links I attach to the post.

I don’t particularly like the term “modeler”, but I can’t offer another at the moment. In the past I’ve talked of “naturalist criticism” and so of the “naturalist critic.” That suffers from the use of “critic”. I’ve taken the term “speculative engineer” from the preface to my book on music, Beethoven’s Anvil (2001), where I describe my method as speculative engineering. I do like that, but I can’t see it being used as a term of art.

That being said, this is pretty much how I see things at the moment. If I changed the tone a bit I could turn it into a manifesto.

References

[1] William Benzon, Literary Morphology: Nine Propositions in a Naturalist Theory of Form, PsyArt: An Online Journal for the Psychological Study of the Arts, August 2006, Article 060608, https://www.academia.edu/235110/Literary_Morphology_Nine_Propositions_in_a_Naturalist_Theory_of_Form.

[2] William Benzon, Articulate Vision: A Structuralist Reading of "Kubla Khan", Language and Style, Vol. 8: 3-29, 1985, https://www.academia.edu/8155602/Articulate_Vision_A_Structuralist_Reading_of_Kubla_Khan_.

[3] William Benzon, Cognitive Networks and Literary Semantics, MLN 91: 1976, 952-982, https://www.academia.edu/235111/Cognitive_Networks_and_Literary_Semantics.

[4] William Benzon and David Hays, “Computational Linguistics and the Humanist”, Computers and the Humanities, Vol. 10. 1976, pp. 265-274, https://www.academia.edu/1334653/Computational_Linguistics_and_the_Humanist.

[5] William Benzon, Description as Intellectual Craft in the Study of Literature, Working Paper, September 2017, 35 pp., https://www.academia.edu/4262467/Description_as_Intellectual_Craft_in_the_Study_of_Literature.

[6] William Benzon, An Open Letter to Dan Everett about Literary Criticism, June 2017, 24 pp., https://www.academia.edu/33589497/An_Open_Letter_to_Dan_Everett_about_Literary_Criticism.

GPT-3 does philosophy



Here's the essays GPT-3 was responding to at The Daily Nous.

Thursday, July 30, 2020

Collective decision making among spider monkeys

Language learning in children


A bunch of tweets go here. Then:



Wednesday, July 29, 2020

2. The brain, the mind, and GPT-3: Dimensions and conceptual spaces

[Edited, with a substantial addition, August 2, 2020]

The purpose of this post is to sketch a conceptual framework in which we can understand the success of language models such as GPT-3 despite the fact that they are based on nothing more than massive collections of bare naked signifiers. There’s not a signified in sight, much less any referents. I have no intention of even attempting to explain how GPT-3 works. That it does work, in an astonishing variety of cases if (certainly) not universally, is sufficient for my purposes.

First of all I present the insight that sent me down this path, a comment by Graham Neubig in an online conversation that I was not a part of. Then I set that insight in the context of and insight by Sydney Lamb (meaning resides in relations), a first-generation researcher in machine translation and computational linguistics. I think take a grounding case by Julian Michael, that of color, and suggest that it can be extended by the work of Peter Gärdenfors on conceptual spaces.

A clue: an isomorphic transform into meaning space

At the 58th Annual Meeting of the Association for Computational Linguistics Emily M. Bender and Alexander Koller delivered a paper, Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data [1], where NLU means natural language understanding. The issue is pretty much the one I laid out in my previous posts in the sections “No words, only signifiers” and “Martin Kay, ‘an ignorance model’” [2]. A lively discussion ensured online which Julian Michael has summarized and commented on in a recent blog post [3].

In that post Michael quotes a remark by Graham Neubig:
One thing from the twitter thread that it doesn’t seem made it into the paper... is the idea of how pre-training on form might learn something like an “isomorphic transform” onto meaning space. In other words, it will make it much easier to ground form to meaning with a minimal amount of grounding. There are also concrete ways to measure this, e.g. through work by Lena Voita or Dani Yogatama... This actually seems like an important point to me, and saying “training only on form cannot surface meaning,” while true, might be a little bit too harsh— something like “training on form makes it easier to surface meaning, but at least a little bit of grounding is necessary to do so” may be a bit more fair.
That’s my point of departure in this post, that notion of “an ‘isomorphic transform’ onto meaning space.” I am going to sketch a framework in which we can begin unpacking that idea. But it may take awhile to get there.

Meaning is in relations

I want to develop an idea I have from Sydney Lamb, that meaning resides in relations. The idea is grounded in the “old school” world of symbolic computation, where language is conceived as a relational network of items. The meaning of any item in the network is a function of its position in the network.

Let’s start with this simple diagram:


It represents the fact that the central nervous system (CNS) is coupled to two worlds, each external to it. To the left we have the external world. The CNS is aware of that world through various senses (vision, hearing, smell, touch, taste, and perhaps others) and we act in that world through the motor system. But the CNS is also coupled to the internal milieu, with which it shares a physical body. The net is aware of that milieu by chemical sensors indicating contents of the blood stream and of the lungs, and by sensors in the joints and muscles. And it acts in the world through control of the endocrine system and the smooth muscles. Roughly speaking the CNS guides the organism’s actions in the external world so as to preserve the integrity of the internal milieu. When that integrity is gone, the organism is dead.

Now consider this more differentiated presentation of the same facts:


I have divided the CNS into four sections: A) senses the external world, B) senses the internal milieu, D) guides action in the internal milieu, and D) guides action in the external world. I rather doubt that even a very simple animal, such as C. elegans, with 302 neurons, is so simple. But I trust my point will survive that oversimplification.

Lamb’s point is that the “meaning” or “significance” of any of those nodes – let’s not worry at the moment whether they’re physical neurons or more abstract entities – is a function of its position in the entire network, with its inputs from and outputs to the external world and the inner milieu [4]. To appreciate the full force of Lamb’s point we need to recall the diagrams typical of old school symbolic computing, such as this diagram from Brian Phillips we used in the previous post:
All of the nodes and edges have labels. Lamb’s point is that those labels exist for our convenience, they aren’t actually a part of the system itself. If we think of that network as a fragment from a human cognitive system – and I’m pretty sure that’s how Phillips thought about it, even if he could not justify it in detail (no one could, not then, not now) – then it is ultimately connected to both the external world and the inner milieu. All those labels fall away; they serve no purpose. Alas, Phillips was not building a sophisticated robot, and so those labels are necessary fictions.

But we’re interested in the full real case, a human being making their way in the world. In that case let us assume that, for one thing, the necessary diagram is WAY more complex, and that the nodes and edges do not represent individual neurons. Rather, they represent various entities that are implemented in neurons, sensations, thoughts, perceptions, and so forth. Just how such things are realized in neural structures is, of course, a matter of some importance and is being pursued by hundreds of thousands of investigators around the world. But we need not worry about that now. We’re about to fry some rather more abstract fish (if you will).

Some of those nodes will represent signifiers, to use the Saussurian terminology I used in my previous post, and some will represent signifieds. What’s the difference between a signifier and a signified? Their position in the network as a whole. That’s all. No more, no less. Now, it seems to me, we can begin thinking about Neubig’s “isomorphic transform” onto meaning space.

Let us notice, first of all, that language exists as strings of signifiers in the external world. In the case that interests us, those are strings of written characters that have been encoded into computer-readable form. Let us assume that the signifieds – which bear a major portion of meaning, no? – exist in some high dimensional network in mental space. This is, of course, an abstract space rather than the physical space of neurons, which is necessarily three dimensional. However many dimensions this mental space has, each signified exists at some point in that space and, as such, we can specify that point by a vector containing its value along each dimension.

What happens when one writes? Well, one produces a string of signifiers. The distance between signifiers on this string, and their ordering relative to one another, are a function of the relative distances and orientations of their associated signifieds in mental space. That’s where to look for Neubig’s isometric transform into meaning space. What GPT-3, and other NLP engines, does is to examine the distances and ordering of signifiers in the string and compute over them so as to reverse engineer the distances and orientations of the associated signifieds in high-dimensional mental space.
[A little reflection on that formulation makes it clear that it fails to take into account a distinction central to ‘old school’ symbolic computation, that between semantic and episodic memory. Rather than interrupt this argument with a refined formulation I have placed that in an appendix to this post: A more refined approach to meaning space. I also offer some remarks on need for a connection to the physical world in order to handle common-sense reasoning.]
Is the result perfect? Of course not – but then how do we really know? It’s not as though we’ve got a well-accepted model of human conceptual space just lying around on a shelf somewhere. GPT-3’s language model is perhaps as good as we’ve got at the moment, and we can’t even open the hood and examine it. We know its effectiveness by examining how it performs. And it performs very well.

Interlude: Slocum’s Pilot and Sensory Deprivation

Bumping this to the top because this is fundamental work. It's from July 2011, but I read La Barre back in the middle to late 1970s. I'm thinking of it now in connection with GPT-3.
The material below the asterisks is from my notes and has, as its point of departure, one of my touchstone texts, a passage from Weston La Barre’s The Ghost Dance, a classic anthropological study of the origins of religion. It was written before the era of evolutionary psychology and so doesn’t go at origins in that way. Yet it manages to be consistently interesting and insightful.

I’m posting this because it’s relevant to both Heart of Darkness and to Apocalypse Now, which focus on people trying to make sense of a world that is not comfortable and familiar to them.

* * * * *

Early in The Ghost Dance Weston La Barre considers what happens to the mind under various conditions of deprivation. Consider this passage about Captain Joshua Slocum, who sailed around the world alone at the turn of the 20th Century:
Once in a South Atlantic gale, he double-reefed his mainsail and left a whole jib instead of laying-to, then set the vessel on course and went below, because of a severe illness. Looking out, he suddenly saw a tall bearded man, he thought at first a pirate, take over the wheel. this man gently refused Slocum’s request to take down the sails and instead reassured the sick man he would pilot the boat safely through the storm. Next day Slocum found his boat ninety-three miles further along on a true course. That night the same red-capped and bearded man, who said he was the pilot of Columbus’ Pinta, came again in a dream and told Slocum he would reappear whenever needed.
La Barre goes on to cite similar experiences happening to other explorers and to people living in isolation, whether by choice, as in the case of religious meditation, or force, as in the case of prisoners being brainwashed.

In the early 1950s Woodburn Heron, a psychologist in the laboratory of Donald Hebb, conducted some of the earliest research on the effects of sensorimotor deprivation. The subjects were placed on a bed in a small cubicle. They wore translucent goggles that transmitted light, but no visual patterns. Sound was masked by the pillow on which they rested their heads and by the continuous hum of air-conditioning equipment. Their arms and hands were covered with cardboard cuffs and long cotton gloves to blunt tactile perception. They stayed in the cubicle as long as they could, 24 hours a day, with brief breaks for eating and going to the bathroom.

The results were simple and dramatic. Mental functioning as measured by simple tests administered after 12, 24, and 48 hours or isolation deteriorated. Subjects lost their ability to concentrate and to think coherently. Most dramatically, subjects began hallucinating. They would begin with simple forms and designs and evolve into whole scenes. One subject saw dogs, another saw eyeglasses, and they had little control over what they saw; no matter how hard they tried, they couldn’t change what they were seeing. A few subjects had auditory and tactile hallucinations. Upon emerging from isolation the visual world appeared distorted with some subjects reporting that the room appeared to be moving. Woodburn concluded, as have other investigators, that the waking brain requires a constant flux of sensory input in order to function properly.

Of course, one might object to this conclusion by pointing out that, in particular, these people were deprived interaction with other people and that is what causes the instability, not mere sensory deprivation. But, from our point of view, that is no objection at all. For other people are a major part of the environment in which human beings live. The rhythms of our intentional structures are stable only if they are supported by the rhythms of the external world. Similarly, one might object that, while these people were cut off from the external physical world, their brains, of course, were still operating in the interior milieu. Consequently the instabilities they experienced reflect “pressure” from the interior milieu that is not balanced by activity in the external world. This may well be true, I suspect that it is, but it is no objection to the idea that the waking brain requires constant input from the external world in order to remain stable. Rather, this is simply another aspect of that requirement.

Thus I suggest that detaching one’s attention from the immediate world to “think” may cause problems. And yet it is the capacity for such thought that is one aspect of the mental agility that distinguishes us from our more primitive ancestors. How do we keep the nervous system stable enough to think coherently?

Conrad’s Kurtz faced one strange world; Coppola’s faced a different one. But neither could adapt their thoughts to the world they actually faced. So they sought coherence in adapting that world to their thoughts. Each failed.

Japanese fishermen's coats

Mars or bust!

Three cheers for the Democratic Republic of the Congo [formerly known among colonialists as "darkest Africa"]

Tyler Cowen has a conversation with Nathan Nunn, a development economist at Harvard. Among other things, the conversation touches on the Democratic Republic of the Congo which, as some of you may know, is the locus a series of posts on something I call Kisangani 2150. Take world Kim Stanley Robinson established in New York 2140, run it forward a decade, and imagine what Kisangani would be like. Key post: Kisangani 2150, or Reconstructing Civilization on a New Model.
COWEN: If you try to think, say, within Africa, what would be some places that you would be modestly more optimistic about than, say, a hedge fund manager who didn’t understand persistence? What would a few of those countries be? Again, recognizing enormous noise, variance, and so on, as with smoking and lung cancer.

NUNN: If I’m true to exactly what I was just saying, then southern Africa or places where you have a larger population of societies that historically were more developed. South Africa, you have the Afrikaans, and they have a different descent than others. That’s if I’m true to what I was saying. But that’s ignoring that, also within Africa, you had a very large number of successful, well-developed states, and that was prior to European colonialism and the slave trade. So one could look at those cases. 
One area that I worked at, the Democratic Republic of Congo, where you had the great Congo Kingdom, the Kuba Kingdom, a large number of other kingdoms, the Luba for example — that would probably be one country. That country today is pretty much as low as — in terms of per capita income — as you can be, right at subsistence. But if we’re predicting just based purely on persistence and historical state formation, that would be one to pick.
On crime – and fun:
COWEN: Why do you think many parts of the New World — and I have in mind Latin America — have relatively high levels of crime for their per capita income? Latin America also, as you know, has pretty high levels of education for its per capita income. There may be trust at some micro levels, but crime rates in the New World are much higher than anywhere else. Crime rates in Latin America very often are higher than in most parts of Africa.

What has gone wrong there in terms of the intergenerational transmission of trust? And of course, it’s multi-ethnic, but so is much of Africa.

NUNN: That’s a good question. I haven’t thought about that. And also, I obviously know less about Latin America. One is, I’m not sure that it’s related to trust. I think it’s related to whatever tools and mechanisms a society can employ to constrain activities which we call crime.

I can tell you more about what happens in sub-Saharan Africa and the Democratic Republic of Congo, where I spend much of my time. There, you wouldn’t think that the formal institutions are better than in Brazil, for example. The police force is less well functioning, but the crime rates — we were very surprised when we first went to the areas where we stay — are extremely, extremely low.

So what is it? It’s not through formal mechanisms, but it’s through informal mechanisms such that you could almost think of it as mob justice, that if one person commits a crime, there’s going to be informal actions taken to punish that individual. That relies on the strength of indigenous, informal institutions or social structures that prevent that. 
In Latin America, it seems like the reliance are on these more modern, formal institutions which aren’t as good as other countries. The other thing about Latin America — I would say there’s extreme inequality. We see this in national Gini coefficients. That’s different than countries where you’re very close to subsistence, and the scope for inequality is much less. I would just guess that that has a big part to do with it as well. But those are all just conjectures.

COWEN: Is it fun to visit Democratic Republic of Congo?

NUNN: Yeah, it’s great. Yeah.

COWEN: Tell us what’s fun. I need to go once I can.

NUNN: Yeah, it’s really, really great. The first time we went as a team — this is James Robinson, Sara Lowes, Jonathan Weigel in 2013 — we were pretty apprehensive. You hear a lot of stories about the DRC. It sounds like a very unsafe place, et cetera. But one thing we didn’t realize or weren’t expecting was just how lovely and wonderful the people are.

And it turns out it’s not unsafe in general. It depends on different locations. In the east, definitely near Goma, it’s obviously much, much less safe. But I think what, for me, is wonderful is the sense of community. Because the places we go are places that haven’t been touched, to a large extent, by foreign aid or NGOs or tourism, I think we are treated just like any other individual within the community.

In the psychology literature, it’s often referred to as collectivist versus individualist culture. I think it’s just a culture where the individual is less important. You’re more embedded in the community, their social relations, and I think that’s nice. It’s nice to experience that — coming from a Western society — for a month every year. [...]

Actually, what I really enjoy is just going back to the different parts of the DRC, kind of on a regular basis. Given that you develop bonds with different people, that’s really nice to see them, see how they’re doing over time, and that sort of thing.

COWEN: There’s a recent online piece by Morgan Kelly. I’m sure you know it. It’s called “The Standard Errors of Persistence,” and it’s pretty technical. Feel free to give us an answer that no one will understand, but he says, “Many persistence regressions can strongly predict spatial noise.” What do you think of this piece?

NUNN: I think it has an important lesson, which is, we have to really be careful when we’re thinking about societies, or people, or anything really — institutions, policies. Because in the cross-section especially . . . well, actually, not only cross-section but in the time series, there’s a lot of correlation across observations.

If you looked at, for example, the eastern DRC, those groups, those individuals there are going to have a lot of similar experiences as just across the border in Rwanda, and they’re going to be culturally somewhat similar. The further you move away, the more independent they are, but the closer spatially you are, the more correlated they are. So if we’re looking at any correlations and there are these omitted factors, then if you’re close to one another, your error terms are going to be more similar.

That’s basically an important point of that paper. If you don’t take that into account — and it’s hard because there’s a lot we don’t know — if you don’t take that into account, you can get a lot of false positives. And part of that comes from overestimating the effective number of observations that you have.
Music:
COWEN: What do you like best in African music?

NUNN: I’m not super familiar with African music, except for the local Congolese music, which is —

COWEN: Well, that’s one of the peaks, right?

NUNN: Yeah, exactly. I like it. It’s fun. There’s memories that bring me back to the first road trip we did, when we went to visit the Kuba Kingdom. It was in this SUV that we rented, and we had these tapes playing with Congolese music, and that was great. We even had the air conditioning for about 10 minutes. Then the tape machine caught on fire, and then the air conditioning broke down.

But that music still reminds me of that trip, which was a two- or three-day trip, or actually four- or five-day round trip into the interior, which was my first trip to the Congo. What I don’t like is it’s usually associated with dancing, and I’m a very stiff, rigid person [laughs] that’s not skilled at dancing.
There's much more at the link.

Einstein and thought experiments [+ Maxwell's demon]



Sabine Hossenfelder, Einstein’s Greatest Legacy: Thought Experiments, Backreaction, July 25, 2020:
Einstein’s greatest legacy is not General Relativity, it’s not the photoelectric effect, and it’s not slices of his brain. It’s a word: Gedankenexperiment – that’s German for “thought experiment”.

Today, thought experiments are common in theoretical physics. We use them to examine the consequences of a theory beyond what is measureable with existing technology, but still measureable in principle. Thought experiments are useful to push a theory to its limits, and doing so can reveal inconsistencies in the theory or new effects. There are only two rules for thought experiments: (A) relevant is only what is measureable and (B) do not fool yourself. This is not as easy as it sounds.

The maybe first thought experiment came from James Maxwell and is known today as Maxwell’s demon. Maxwell used his thought experiment to find out whether one can beat the second law of thermodynamics and build a perpetual motion machine, from which an infinite amount of energy could be extracted.

Yes, we know that this is not possible, but Maxwell said, suppose you have two boxes of gas, one of high temperature and one of low temperature. If you bring them into contact with each other, the temperatures will reach equilibrium at a common temperature somewhere in the middle. In that process of reaching the equilibrium temperature, the system becomes more mixed up and entropy increases. And while that happens – while the gas mixes up – you can extract energy from the system. It “does work” as physicists say. But once the temperatures have equalized and are the same throughout the gas, you can no longer extract energy from the system. Entropy has become maximal and that’s the end of the story.

Maxwell’s demon now is a little omniscient being that sits at the connection between the two boxes where there is a little door. Each time a fast atom comes from the left, the demon lets it through. But if there’s a fast atom coming from the right, the demon closes the door. This way the number of fast atoms on the one side will increase, which means that the temperature on that side goes up again and the entropy of the whole system goes down.

It seems like thermodynamics is broken, because we all know that entropy cannot decrease, right? So what gives? Well, the demon needs to have information about the motion of the atoms, otherwise it does not know when to open the door. This means, essentially, the demon is itself a reservoir of low entropy. If you combine demon and gas the second law holds and all is well. The interesting thing about Maxwell’s demon is that it tells us entropy is somehow the opposite of information, you can use information to decrease entropy. Indeed, a miniature version of Maxwell’s demon has meanwhile been experimentally realized.

But let us come back to Einstein. Einstein’s best known thought experiment is that he imagined what would happen in an elevator that’s being pulled up. Einstein argued that there is no measurement that you can do inside the elevator to find out whether the elevator is in rest in a gravitational field or is being pulled up with constant acceleration. This became Einstein’s “equivalence principle”, according to which the effects of gravitation in a small region of space-time are the same as the effects of acceleration in the absence of gravity. If you converted this principle into mathematical equations, it becomes the basis of General Relativity.
There is more at the link.

Monday, July 27, 2020

Woman, kimono, cat

1. No meaning, no how: GPT-3 as Rubicon and Waterloo, a personal view

I say that not merely because I am a person and, as such, I have a point of view on GPT-3, and related matters. I say because the discussion is informal, without journal-class discussion of this, that, and the others, along with the attendant burden of citation, though I will offer a few citations. More over, I’m pretty much making this up as I go along. That is to say, I am trying to figure out just what it is that I think, and see value in doing so in public.

What value, you ask? It commits me to certain ideas, if only at a certain time. It lays out a set of priors and thus serves to sharpen my ideas developments unfold and I, inevitably, reconsider.

GPT-3 represents an achievement of a high order; it deserves the attention it has received, if not the hype. We are now deep in “here be dragons” territory and we cannot go back. And yet, if we are not careful, we’ll never leave the dragons, we’ll always be wild and undisciplined. We will never actually advance; we’ll just spin faster and faster. Hence GPT-3 is both a Rubicon, the crossing of a threshold, and a potential Waterloo, a battle we cannot win.

Here’s my plan: First we take a look at history, at the origins of machine translation and symbolic AI. Then I develop a fairly standard critic of semantic models such as those used in GPT-3 which I follow with some remarks by Martin Kay, one of the Grand Old Men of computational linguistics. Then I look at the problem of common sense reasoning and conclude be looking ahead to the next post in this series in which I offer some speculations on why (and perhaps even how) these models can succeed despite their sever and fundamental short-comings.

Background: MT and Symbolic computing

It all began with a famous memo Warren Weaver wrote in 1949. Weaver was director of the Natural Sciences division of the Rockefeller Foundation from 1932 to 1955. He collaborated Claude Shannon in the publication of a book which popularized Shannon’s seminal work in information theory, The Mathematical Theory of Communication. Weaver’s 1949 memorandum, simply entitled “Translation” [1], is regarded as the catalytic document in the origin of machine translation (MT) and hence of computational linguistics (CL) and heck! why not? artificial intelligence (AI).

Let’s skip to the fifth section of Weaver’s memo, “Meaning and Context” (p. 8):
First, let us think of a way in which the problem of multiple meaning can, in principle at least, be solved. If one examines the words in a book, one at a time as through an opaque mask with a hole in it one word wide, then it is obviously impossible to determine, one at a time, the meaning of the words. “Fast” may mean “rapid”; or it may mean "motionless"; and there is no way of telling which.

But if one lengthens the slit in the opaque mask, until one can see not only the central word in question, but also say N words on either side, then if N is large enough one can unambiguously decide the meaning of the central word. The formal truth of this statement becomes clear when one mentions that the middle word of a whole article or a whole book is unambiguous if one has read the whole article or book, providing of course that the article or book is sufficiently well written to communicate at all.
It wasn’t until the 1960s and ‘70s that computer scientists would make use of this insight; Gerard Salton was the central figure and he was interested in document retrieval [2]. Salton would represent documents as a vector of words and then query a database of such representation by using a vector composed from user input. Documents were retrieved as a function of similarity between the input query vector and the stored document vector.

Work on MT went a different way. Various approaches were used, but at some relatively early point researchers were writing formal grammars of languages. In some cases these grammars were engineering conveniences while in others they were taken to represent the mental grammars of humans. In any event, that enterprise fell apart in the mid-1960s. The prospects for practical results could not justify federal funding and the government had interest in supporting purely scientific research into the nature of language.

But such research continued nonetheless, sometimes under the rubric of computational linguistics (CL) and sometimes as AI. I encountered CL in graduate school in the mid-1970s when I joined the research group of David Hays in the Linguistics Department of the State University of New York at Buffalo – I was actually enrolled as a graduate student in English; it’s complicated.

Many different semantic models were developed, but I’m not interested in anything like a review of that work, just a little taste. In particular I am interested in a general type of model was known as a semantic or cognitive network. Hays had been developing such a model for some years in conjunction with several graduate students [2]. Here’s a fragment of a network from a system developed by one of those students, Brian Phillips, to tell whether or not stories of people drowning were tragic [3]. Here’s a representation of capsize:
Notice that there are two kinds of nodes in the network, square ones and smaller round ones. The square ones represent a scene while the round ones represent individual objects or events. Thus the square node at the upper left indicates a scene with two sub-scenes – I’m just going to follow out the logic of the network without explaining it in any detail. The first one asserts that there is a boat that contains one Horatio Smith. The second one asserts that the boat overturns. And so forth through the rest of the diagram.

This network represents semantic structure. In the terminology of semiotics, it represents a network of signifieds. Though Phillips didn’t do so, it would be entirely possible to link such a semantic network with a syntactic network, and many systems of that era did so.

Such networks were symbolic in the (obvious) sense that the objects in them were considered to be symbols, not sense perceptions or motor actions nor, for that matter, neurons, whether real or artificial. The relationship between such systems and the human brain was not explored, either in theory or in experimental observation. It wasn’t an issue.

That enterprise collapsed in the mid-1980s. Why? The models had to be hand-coded, which took time. They were computationally expensive and so-called common sense reasoning proved to be endless, making the models larger and larger. (I discuss common sense below and I have many posts at New Savanna on the topic [4].)

Oh, the work didn’t stop entirely. Some researchers kept at it. But interests shifted toward machine learning techniques and toward artificial neural networks. That is the line of evolution that has, three or four decades later, resulted in systems like GPT-3, which also owe a debt to the vector semantics pioneered by Salton. Such systems build huge language models from huge databases – GPT-3 is based on 500 billion tokens [5] – and contain no explicit models of syntax or semantics anywhere, at least not that researchers can recognize.

Researchers build a system that constructs a language model (“learns” the language), but the inner workings of that model are opaque to the researchers. After all, the system built the model, not the researchers. They only built the system.

It is a strange situation.

AI, and computers more generally, violate basic conceptual categories

Iris Berent, Op-Ed: The real reason we’re afraid of robots, LA Times, July 26, 2020:
If you saw a ball start rolling all by itself, you’d be astonished. But you wouldn’t be the least bit surprised to see me spontaneously rise from my seat on the couch and head toward the refrigerator.

That is because we instinctively interpret the actions of physical objects, like balls, and living agents, like people, according to different sets of principles. In our intuitive psychology, objects like balls always obey the laws of physics —they move only by contact with other objects. People, in contrast, are agents who have minds of their own, which endow them with knowledge, beliefs, and goals that motivate them to move on their own accord. We thus ascribe human actions, not to external material forces, but to internal mental states.

Of course, most modern adults know that thought occurs in the physical brain. But deep down, we feel otherwise. Our unconscious intuitive psychology causes us to believe that thinking is free from the physical constraints on matter. Extensive psychological testing shows that this is true for people in all kinds of societies. The psychologist Paul Bloom suggests that intuitively, all people are dualists, believing that mind and matter are entirely distinct.

AI violates this bedrock belief. Siri and Roomba are man-made artifacts, but they exhibit some of the same intelligent behavior that we typically ascribe to living agents. Their acts, like ours, are impelled by information (thinking), but their thinking arises from silicon, metal, plastic and glass. While in our intuitive psychology thinking minds, animacy and agency all go hand in hand, Siri and Roomba demonstrate that these properties can be severed — they think, but they are mindless; they are inanimate but semiautonomous.
David Hays and I pointed this out some time ago in our paper, "The Evolution of Cognition" (1990):
One of the problems we have with the computer is deciding what kind of thing it is, and therefore what sorts of tasks are suitable to it. The computer is ontologically ambiguous. Can it think, or only calculate? Is it a brain or only a machine?

The steam locomotive, the so-called iron horse, posed a similar problem for people at Rank 3. It is obviously a mechanism and it is inherently inanimate. Yet it is capable of autonomous motion, something heretofore only within the capacity of animals and humans. So, is it animate or not? Perhaps the key to acceptance of the iron horse was the adoption of a system of thought that permits separation of autonomous motion from autonomous decision. The iron horse is fearsome only if it may, at any time, choose to leave the tracks and come after you like a charging rhinoceros. Once the system of thought had shaken down in such a way that autonomous motion did not imply the capacity for decision, people made peace with the locomotive.

The computer is similarly ambiguous. It is clearly an inanimate machine. Yet we interact with it through language; a medium heretofore restricted to communication with other people. To be sure, computer languages are very restricted, but they are languages. They have words, punctuation marks, and syntactic rules. To learn to program computers we must extend our mechanisms for natural language.

As a consequence it is easy for many people to think of computers as people. Thus Joseph Weizenbaum, with considerable dis-ease and guilt, tells of discovering that his secretary “consults” Eliza—a simple program which mimics the responses of a psychotherapist—as though she were interacting with a real person (Weizenbaum 1976). ... We still do, and forever will, put souls into things we cannot understand, and project onto them our own hostility and sexuality, and so forth.

When the Federales do not know the law, how can they legitimately police? [the Portland story]

Over the past few days, millions of people have seen a now-viral video in which two federal agents dressed in full combat gear removed an apparently peaceful protester from the streets of Portland, Ore., and carried him away in an unmarked van.

Stories have emerged of other people being taken or pursued by federal agents in a similar fashion. Meanwhile, troubling videos show federal agents in Portland beating a peacefully resolute U.S. Navy veteran and, on a separate occasion, shooting a man in the face with a nonlethal munition, which broke his skull.

As criticism of these events rolled in—including from virtually every relevant state and local official in Oregon—the Department of Homeland Security scheduled a press conference earlier this week to try to reclaim the narrative. If the point of that press conference was to reassure an anxious nation that this unfamiliar and recently constituted federal police force is following the law, it likely achieved the opposite effect.

In particular, there is a two-minute segment of the press conference that is both revealing and highly disturbing. It shows that one of the top commanders of this new paramilitary federal police force—Kris Cline, Deputy Director of the Federal Protective Service—apparently does not know what the word “arrest” means. To say as much might seem like harping on semantics or, worse, like picking on Cline for speaking inartfully. But it is absolutely critical to unpack and examine Cline’s words—because the word arrest is one of the most important words in the constitutional law of policing.

Simply put, for an arrest to be constitutional it must be supported by probable cause. This means that the arresting officer must be able to point to specific facts that would cause a reasonable officer to believe that the person being arrested has committed a specific crime. If, on the other hand, the police have not arrested someone but have instead conducted only a brief investigatory stop, they need substantially less proof that the target of their attention is engaged in criminal activity. And if the police initiate instead what is often termed a consensual contact—as would occur if, say, a uniformed officer walked up to you and said, “hey, I want to ask you some questions”—well, in that case the Fourth Amendment simply does not apply, which means the officer does not need to have any reason to approach you.
Prof. Timothy Snyder of Yale: It’s very troubling. To say that the man was not arrested is simply lying. This is what authoritarian propaganda sounds like. A man has been arrested and you find some other way to describe it, for example, as a ‘simple engagement,’ which is false but it sounds like a technical term. So you stop and think about it. That’s how authoritarian propaganda works.
Arrests, stops and contacts carve up the universe of police-civilian interactions in the United States. So, when I say that Deputy Director Cline does not appear to know what the word “arrest” means, what I am really saying is that he does not know where the basic and essential legal lines are that mark the bounds of his agency’s lawful authority. That is a problem.
Crespo then goes go point out in some detail why Kline was wrong in his reasoning.
There is an odd, disorienting quality to Cline’s two-minute statement. I have no reason to question Cline’s integrity or motives. But on its face, his statement feels like a kind of criminal procedure version of gaslighting. With an earnest, “just the facts” style, Cline is clearly trying to convince the public that what happened in Portland is not a big deal.

The agents were “peaceful,” he said. “There was no tackle to the ground.” This was just “a simple engagement.” It is unfortunate, Cline tells us, that this all “got kinda spun out of control with the rhetoric about what happened,” as if the people questioning the legality of the arrest are the ones blowing this all out of proportion. After all, Cline reminds us, “it was not a custodial arrest.”

Except it was.

Sunday, July 26, 2020

Is capitalism over?

John Quiggin, The end of interest, Crooked Timber, 26 July 2020.
Although my book-in-progress is called The Economic Consequences of the Pandemic, a lot of it will deal with changes that were already underway, and have only been accelerated by the pandemic. This was also true of Keynes’ Economic Consequences of the Peace. The economic order destroyed by the Great War was already breaking down, as was discussed for example, in Dangerfield’s Strange Death of Liberal England.

Amid all the strange, alarming and exciting things that have happened lately, the fact that real long-term (30-year) interest rates have fallen below zero has been largely overlooked. Yet this is the end of capitalism, at least as it has traditionally been understood. Interest is the pure form of return to capital, excluding any return to monopoly power, corporate control, managerial skills or compensation for risk.

If there is no real return to capital, then then there is no capitalism. In case it isn’t obvious, I’ll make the point in subsequent posts that there is no reason to expect the system that replaces capitalism (I’ll call it plutocracy for the moment) to be an improvement.

Saturday, July 25, 2020

Walt Disney, Stephen Miller and the Future of Jersey City

I'm bumping this 2013 post to the top of the queue on general principle. But also, BECAUSE. Who'd believe an expert with a laser cutter was living in the ghetto and included international starchitect Norman Foster among his clients? Not that I think of it as a ghetto. It's where I lived for 2 and a half years and felt very comfortable, at home, thank you very much

* * * * *

Buildings ... are not discrete objects. They are building blocks of a democratic society. W. H. Auden once proposed that a civilization could be judged by "the degree of diversity attained and the degree of unity attained." In the spirit of service, architecture can contribute to both. Without the spirit of service, architecture can be a highly destructive force.

– Herbert Muschamp, Visions of Utopia
No doubt you are familiar with Walt Disney, the guy who made cartoons and nature documentaries, created the world’s first theme park, and gave his name to what is now the world’s largest entertainment company. But it’s been years since Disney himself appeared in the media – he died in 1966 – and his life story isn’t well-known, though there must be at least a dozen biographies of him (I’ve read four of them).

But what does Uncle Walt have to do with Stephen Miller and what do either of them have to do with the future of Jersey City?

And, by the way, WHO is Stephen Miller?

I don’t know how many laser cutters there are in Jersey City – 10, 20, 100, 763? I have no idea – but one of them is in his atelier off Harrison Street between Monticello and Bergen.

What’s a laser cutter?

It’s a high tech jigsaw used for cutting materials such as wood, plastic, leather, metal perhaps.

And what the h___ is an atelier?

It’s a workshop and design studio. A high-class term.

OK, gotcha, but what does that have to do with Walt Disney and what do they have to do with the future of Jersey City?

Let’s start with Walt Disney. Disney was an entertainer; he made movies and went on to build a theme park. Miller is an entertainer too, though of a different kind. He’s musician and a very good MC – he tells me he used to front a band. And he’s a slammin’ djembe player.

And I know a little about djembe players. When I lived in upstate New York I performed with Eddie “Ade” Knowles, a percussionist who toured as a percussionist with Gil Scott-Heron early in his career. I hear and feel the same power and nuance in Miller’s djembe playing that Ade has in his.

OK, so he’s an entertainer, there are lots of entertainers in the world...

Just cool your jets. Don’t go getting testy on me. I’m gettin’ there.

Take a look at this video (embedded below). It’s a promotional video that Disney prepared for Epcot (Experimental Prototype Community of Tomorrow) and it shows a small city that’s very different from and far more interesting than what the Disney Company eventually built in central Florida.



About 16 minutes in you’ll see an architectural model for the central building and transportation hub that Disney had planned (but that never got built). Well, those models don’t grow on trees – though such models often feature model trees. Someone has to build them. That’s how Miller makes his living; he makes architectural models.

An architect, say Norman Foster, has a client who’s about to sink $100,000,000 into a project. The client wants to see what he’s getting for all those spondulix, and the client wants to see more than plans and pretty 2D renderings. Plans don't really mean much to anyone but architects and engineers and pretty 2D renderings are, well, pretty, but they don't really give you a sense of the space. What the client really wants, of course, is to walk around and through the finished structure before it’s actually built and paid for. That’s not possible. But it is possible to build a scale model.

That’s Miller’s job. Foster shoots over the plans in Autocad format, Miller feeds them into the laser cutter, and the cutter spits out the parts. Miller assembles the parts into a model, sprinkles some artificial grass on the ground, plants artificial trees and shazayum! finished model, ready for client viewing.

Now, long before he even thought of Disneyland, much less Epcot, Walt Disney built models. As I put it in Walt Disney: A Career in Three Acts:
In the late 1940s Walt turned his attention to model railroads. Not only did he give them as gifts to himself, and his nieces and nephews, he also learned to construct them. He was already a respectable carpenter - skills he had learned from his father. Now he learned how to fabricate the metal parts necessary for a 1/8 scale railroad which was constructed in the yard at his new house. This railroad was his pride and joy; he loved operating the engine - which was a real steam engine, though not full-sized - and giving people rides on the train...

At the same time - the late 1940s - Disney began thinking about creating a “Mickey Mouse Park” on sixteen acres of land near the studio. The original purpose was to have a place where studio employees and visitors could park their children. But, as Disney thought about the park, and investigated amusement parks here and there, his aspirations became ever more elaborate, and ever different from standard amusement parks of the Coney Island type.
And thus Disneyland was born, a theme park that James Rouse – the real estate developer perhaps best-known for creating Columbia, Maryland – called “the greatest piece of urban design in the United States today” in a 1963 speech at Harvard University.

Hollis Robbins on Close Talking, Episode #104 Freedom Rider: Washout (by James Emanuel)



Connor and Jack are joined by Dr. Hollis Robbins, Dean of the School of Arts & Humanities at Sonoma State University and author of the newly published "Forms of Contention: Influence and the African American Sonnet Tradition from University of Georgia Press.

They discuss the poem "Freedom Rider: Washout" by James Emanuel, touching on the memory of Rep. John Lewis, one of the original freedom riders, the reasons the sonnet has such a rich history of use by Black poets, and much more.

Find out more about Forms of Contention, here: ugapress.org/book/9780820357645…rms-of-contention/
Freedom Rider: Washout
By: James Emanuel

The first blow hurt.
(God is love, is love.)
My blood spit into the dirt.
(Sustain my love, oh, Lord above!)
Curses circled one another.
(They were angry with their brother.)

I was too weak
For this holy game.
A single freckled fist
Knocked out the memory of His name.
Bloody, I heard a long, black moan,
Like waves from slave ships long ago.
With Gabriel Prosser’s dogged knuckles
I struck an ancient blow.
Published in Stephen Henderson, Understanding the New Black Poetry: Black Speech and Black Music as Poetic References (NY: Morrow, 1973), 237.