Tuesday, September 12, 2017

Virtual Reading: The Prospero Project Redux [#DH]

I've uploaded another working paper. Title above, abstract, table of contents, and introduction below. Note that it's a long way through the introduction, but there's some good stuff there.

Download at:

* * * * *
Abstract: Virtual reading is proposed as a computational strategy for investigating the structure of literary texts. A computer ‘reads’ a text by moving a window N-words wide through the text from beginning to end and follows the trajectory that window traces through a high-dimensional semantic space computed for the language used in the text. That space is created by using contemporary corpus-based machine learning techniques. Virtual reading is compared and contrasted with a 40 year old proposal grounded in the symbolic computation systems of the mid-1970s. High-dimensional mathematical spaces are contrasted with the standard spatial imagery employed in literary criticism (inside and outside the text, etc.). The “manual” descriptive skills of experienced literary critics, however, are essential to virtual reading, both for purposes of calibration and adjustment of the model, and for motivating low-dimensional projection of results. Examples considered: Heart of Darkness, Much Ado About Nothing, Othello, The Winter’s Tale.

Introduction: Prospero Redux and Virtual Reading 2
In search of a small-world net: Computing an emblem in Heart of Darkness 8
Virtual reading as a path through a multidimensional semantic space 11
Reply to a traditional critic about computational criticism: Or, It’s time to escape the prison-house of critical language [#DH] 17
After the thrill is gone...A cognitive/computational understanding of the text, and how it motivates the description of literary form [Description!] 23
Appendix: Prospero Elaborated 30

Introduction: Prospero Redux and Virtual Reading

In a way, this working paper is a reflection on four decades of work in the study of language, mind, and literature. Not specifically my work, though, yes, certainly including my work. I say in a way, for it certainly doesn’t attempt to survey the relevant literature, which is huge, well beyond the scope of a single scholar. Rather I compare a project I had imagined back then (Prospero), mostly as a thought experiment, but also with some hope that it would in time be realized, with what has turned out to be a somewhat revised version of that project (Prospero Redux), a version which I believe to be doable, though I don’t alone posess the skills, much less the resources, to do it.

The rest of this working paper is devoted to Prospero Redux, the revised version. This introduction compares it with the 40 year-old Prospero. This comparison is a way of thinking about an issue that’s been on my mind for some time: Just what have we learned in the human sciences over the last half-century or so? As far as I can tell, there is no single theoretical model on which a large majority of thinkers agree in the way that all biologists agree on evolution. The details are much in dispute, but there is no dispute that world of living things is characterized by evolutionary dynamics. The human sciences have nothing comparable (though there is a move afoot to adopt evolution as a unifying principle for the social and behavioral sciences). If we don’t have even ONE such theoretical model, just what DO we know? And yet there HAS been a lot of interesting and important work over the last half-century. We must have learned something, no?

Let’s take a look.

Prospero, 1976

Work in machine translation started in the early 1950s [1]; George Miller published his classic article, “The Magical Number Seven, Plus or Minus Two” in 1956; Chomsky published Syntactic Structures in 1957; and we can date artificial intelligence (AI) to a 1956 workshop at Dartmouth [2]. That’s enough to characterize the beginnings of the so-called “Cognitive Revolution” in the human sciences. I encountered that revolution, if you will, during my undergraduate years at Johns Hopkins in the 1960s, where I also encountered semiotics and structuralism. By the early 1970s I was in graduate school in the English Department at The State University of New York at Buffalo, where I joined the research group of David Hays in the Linguistics Department. Hays was a Harvard-educated cognitive scientist who’d headed the mamachine translation program at the RAND Corporation in the 1950s.

At that time a number of reasearch groups were working on cognitive or semantic network models for natural language semantics. It was bleeding edge research at the time. I learned the model Hays and his students had developed and applied it to Shakespeare’s Sonnet 129 (which I touch on a bit later, pp. 21 ff.). At the same time I was preparing abstracts of the current literature in computational linguistics for The American Journal of Computational Linguistics. Hays edited the journal and had a generous sense of the relevant literature.

Thus when Hays was invited to review the field of computational linguistics for Computers and the Humanities it was natural for him to ask me to draft the article. I wrote up the standard kind of review material, including reports and articles coming out on the Defense Department’s speech understanding project, which was perhaps the single largest research effort in the field (I discuss this as well, pp. 20 ff.). But we aspired to more than just a literature review. We wanted a forward-looking vision, something that might induce humanists to look deeper into the cognitive sciences.

We ended the article with a thought experiment (p. 271):
Let us create a fantasy, a system with a semantics so rich that it can read all of Shakespeare and help in investigating the processes and structures that comprise poetic knowledge. We desire, in short, to reconstruct Shakespeare the poet in a computer. Call the system Prospero.

How would we go about building it? Prospero is certainly well beyond the state of the art. The computers we have are not large enough to do the job and their architecture makes them awkward for our purpose. But we are thinking about Prospero now, and inviting any who will to do the same, because the blueprints have to be made before the machine can be built. [...]

The general idea is to represent the requisite world knowledge – what the poet had in his head – and then investigate the structure of the paths which are taken through that world view as we move through the object text, resolving the meaning of the text into the structure of conceptual interrelationships which is the semantic network. Thus the Prospero project includes the making of a semantic network to represent Shakespeare’s version of the Elizabethan world view.
But a model of the Elizabethan world view was “only the background”. We would also have to model Shakespeare’s mind (p. 272):
A program, our model of Shakespeare’s poetic competence, must move through the cognitive model and produce fourteen lines of text. [...] The advantage of Prospero is that it takes the cognitive model as given – clearly and precisely – and the poetic act as a motion through the model. Instead of asking how the words are related to one another, we ask how the words are related to an organized collection of ideas, and the organization of the poem is determined, then, by the world view and poetics in unison. [3]
We declined to predict when such a marvel might have been possible, though I expected to see something within my lifetime. Not something that would rival the Star Trek computer, mind you, not something that could actually think in some robust sense of the word. But something.

What we got some 35 years later was an IBM computer system called Watson that defeated humans in playing Jeopardy [4]. Watson was a marvel, but was and is nowhere near to doing what Hays and I had imagined for Prospero. Nor do I see that old vision coming to life in the forseeable future.

Moreover, Watson is based on newer kind of technology that is quite different from that which Hays and I had reviewed in our article and which we were imagining for Prospero. Prospero came out of a research program, symbolic computing, that all but collapsed a decade later. It was replaced by technology that had a more stochastic character, which involved machine learning, and which, in some increasingly popular versions, was (somewhat distanctly) inspired by real nervous systems. It is this newer technology that runs Google’s online machine translation system, that runs Apple’s Siri, and that is behind much of the work in computational literary criticism.

Before turning to that, however, I want to say just a bit more about what we most likely had in mind – I say “most likely” because that was a LONG time ago and I don’t remember all that was whizzing through my head at the time. We were out to simulate the human mind, to produce a system that was, in at least some of its parts and processes, like the parts and processes of the mind. One could have Prospero read and even write texts while keeping records of what it does. One could then examine those records and thus learn how the mind works. Ambitious? Yes. But the computer simulation of cognitive tasks is quite common in the cognitive sciences, though not on THAT scale. In contrast, Watson, for example, was not intended as a simulation of the mind. It was a straight-up engineering activity. What matters for such systems, and for AI generally, is whether or not the system produces useful results. Whether or not it does so in a human way is, at best, a secondary consideration.

Why didn’t Prospero, or anything like it, happen? For one thing, such systems tended to be brittle. If you get something even a little bit wrong, the whole thing collapses. Then there’s combinatorial explosion; so many alternatives have to be considered on the way to a good one that the system just runs out of time – that is, it just keeps computing and computing and computing [...] without reaching a result. That’s closely related to what is called the “common sense” problem. No text is ever complete. Something must always be inferred in order to make smooth connections between the words in the text. Humans have vast reserves of such common sense knowledge; computing systems do not. How do they get it? Hand coding – which takes time and time and time. And when the system calls on the common sense knowledge that’s been hand-coded into it, what happens? Combinatorial explosion.

The enterprise of simulating a mind through symbolic computing simply collapsed. In the case of something like Prospero I would specially add that it now seems to me that, to tell us something really useful about the mind, such a system would have to simulate the human brain. Hays and I didn’t realize it at the time – we’d just barely begun to think about the brain – but that became obvious some years later in retrospect.

Prospero Redux

Why, then, Prospero Redux?

It’s a different world.

As I said, computational linguists and AI researchers left the world of symbolic computing. No more writing hand-coded rules and patterns. Instead researchers built on the insight that words achieve specific meaning in context, so why not examine patterns of words-in-context and see what you can do with that? In some techniques the only context that matters is the text itself; all structure is thrown away leaving us with a so-called bag of words. Other techniques consider consequtive strings, 2 words (bi-grams), 3 words (tri-grams), N words (ngrams), whatever. It turns out that when you do this with hundreds and thousands of texts consisting of billions of words you can get very interesting results. Thus, for example, Google’s current language translation capabilities are based on such technology and are more successful than anything created back in the years of machine translation based on symbolic programming. To be sure, the translations are by no means perfect – you wouldn’t want to use them for legal documents, for example – but for some purposes they are adequate.

Just how this works, well, I’m not the one to provide simplified explanations, as my own understanding is a bit hazy. I do know, however, that the process involves spaces of high-dimensionality, many more than the three-dimensions of visual, motor, and haptic space. Let’s not worry about the details, important though they are. The fact is, when you’re working in this world you just have to trust that researchers having expertise that you lack, that may even be quite different from your expertise, that they are competent.

As far as I can tell, the investigators who did the reasearch that inspired the idea of virtual reading know what they’re doing (see discussion starting on page 10). In virtual reading one starts with a high-dimensional space encoding semantic relationships in the language of the target text. Then you take your target text and, starting from the beginning, you follow the text’s path through that high-dimensional space. It is my contention, which I explain in more detail later, than that trajectory will tell us something about the text, and the mind. Not everything by any means, not even the meaning of the text, but something about structure. It is not a simulation; it is both more abstract than that, as much of the information and structure you would need for a simulation isn’t there, and more concrete. More concrete in the sense that we can (I believe) do it.

We can’t do a simulation, like we had envisioned with Prospero back in the mid-1970s. This version of Prospero, Prospero Redux, is a different beast, one that reflects some of what we have learned in the last four decades. Prospero Redux was hardly imaginable back then, much less feasible.

Now, think a bit. These computational systems with their high-dimensional semantic spaces calculated over billions and billions of words of text, they know nothing. Yet they produce useful results once they’ve gone through those texts. How is that possible? Well, those billions and billions of words of text, they aren’t just random piles of words. Each of those texts was created by a human mind attempting to communicate something to another human mind (or perhaps itself at a later point in time). The order in those texts must therefore come from the human mind. From that it follows that the order in those high-dimensional spaces in some way reflects the structure of the minds that produced those texts. Specifically, the structure of the semantic systems – which many have attempted to model using the methods I described in the previous section and, again, later in this paper (pp. 25 ff.). Yes, I know, we’re each of us individuals, each different from the others. But language is a tool for communication; to serve that function we must use words in similar fashion. Some aspect of those similarities shows up in those machine-calculated semantic spaces.

The concept of high-dimensional spaces is as general as it is abstract. Physicists have been working with them for a long time. They’re foundational to statistical thermodynamics, where each particle in a system is assigned a dimension of its own [5]. The trick to working with high-dimensional spaces is to find low-dimensional phenomena in them. Thus when physists talk about the phases of matter (solid, liguid, gas, plasma [6]) they will talk about temperature and pressure. Temperature is one dimesion, pressure is another. Using just those two dimensions you can say quite a lot about matter eventhough you believe that systems of matter must ultimately be analyzed in terms of infinite dimensions.

Neuroscientists use high-dimensional systems when they analyze the dynamic activity of nervous systems. Each element in the system (where an element could be a neuron or even a synapse) is assigned a dimension. Now we must be careful. The nervous system itself is a 3D physical system, as is the volume of gas a physicist might analyze. But we analyze it by creating an abstract analytic space where each active element in that physical system is a different dimension in the analytic space.

The late Walter Freeman [7] was a pioneer in investigating the dynamica activity of the nervous system. I corresponded with him while I was working on my book about music (Beethoven’s Anvil) and for several years afterward. Here’s a brief email exchange I had with him (I no longer have the original emails, so I’ve lost the date):

I've had another crazy idea. I've been thinking about Haken's remark that the trick to dealing with dynamical systems is to find phenomena of low dimensionality in them. What I think is that that is what poetic form does for language. The meaning of any reasonable hunk of language is a trajectory in a space of very high dimensionality. Poetic form "carves out" a few dimensions of that space and makes them "sharable" so that "I" and "Thou" can meet in aesthetic contemplation.

So, what does this mean? One standard analytic technique is to discover binary oppositions in the text and see how they are treated. In KK [“Kubla Khan”] Coleridge has a pile of them, human vs. natural, male vs. female, auditory vs. visual, expressive vs. volitional, etc. So, I'm thinking of making a table with one column for each line of the poem and then other columns for each of these "induced" dimensions. I then score the content of each line on each dimension, say +, - and 0. That set of scores, taken in order from first to last line, is the poem's trajectory through a low dimensional projection or compression of the brain's state space.

The trick, of course, is to pull those dimensions out of the EEG data. Having a sound recording of the reading might be useful. What happens if you use the amplitude envelope of the sound recording to "filter" the EEG data?


Bill B

Not crazy, Bill, but technologically challenging!
Will keep on file and get back to you.
Alas, he never did get back to me. For my purposes it is enough the he only found the idea technically challenging (with an exclamation point!), but not crazy.

Not crazy is all I’m looking for. Technologically challenging we can handle. If not now, then at some later date. The Singularity, you know, the time when computers become supersmart so they can outsmart us, I think that’s crazy. But virtual reading, that’s only technically challenging. Just how challenging, I’m not in a position to say.

Once again, let’s think a bit. I’ve talked about high-dimensional semantic spaces in connection with virtual reading. I’m now talking about high-dimensional neurodynamic spaces in connection with the brain. But isn’t the brain the seat of the mind that writes the texts that are the basis of those high-dimensional semantic spaces? Isn’t that high-dimensional semantic space created by the brain and its high dimensional neural space?

What do I think of that?

Not crazy. But challenging.

What’s this have to do with the humanities?

I devote a section of this paper to that question: Reply to a traditional critic about computational criticism (starting on page 16). For many critics, there is no answer. They’ve got a conception of humanistic inquiry that excludes anything that smacks of science [8], that uses numbers, diagrams, or computers (the horror! the horror!).

For other critics, those willing to take a look, I have quite a bit to say in this paper. For example, take a look at the note I wrote to Freeman where I say, “I then score the content of each line on each dimension, say +, - and 0.” That has to be done by someone well-versed in the analytical skills of standard literary criticism. More generally I regard the descriptive skills grounded in extensive experience with texts as essential to this work (see, e.g. pp. 15 ff., pp. 21 ff.).

But there is something else, something deeper and subtler. And that has to do with the notion of computation itself. It is easy to think of computation as something that is fixed and over there somewhere. We embody this or that set of computational concepts in some model and then run our texts through the model. The computational concepts come from that over-there place and are unchanged by our activity.

I don’t think that’s how it will work out. The concept of computation is not yet fully worked out, nor is it likely to be fully worked out. I fully expect that the richest application of computation to literary texts will require the creation of new concepts originating from the properties of those texts. Literary computing must thus be more than the application of OTS (off the shelf) technology to literary investigation. It must necessarily entail purpose-built computational concepts and techniques and so will enrich our understanding of computing.

The rest of this paper

In search of a small-world net: Computing an emblem in Heart of Darkness: The phrase My Intended, my ivory, my station, my river, my— is central to the text. I suggest that it achieves that centrality by being the keystone, if you will, in the semantic space of the text. The construction in this paper starts from this point.

Virtual reading as a path through a multidimensional semantic space: I take the “seed” planted in the previous section the “grow” it into the idea of a virtual reading by calling on work done in 2006 did something similar with 12 texts (9 fiction, 3 non-fiction). I develop the idea by consider, again, Heart of Darkness, and three Shakespeare plays (Much Ado About Nothing, Othello, and The Winter’s Tale).

Reply to a traditional critic about computational criticism: Or, It’s time to escape the prison-house of critical language [#DH]: The standard spatial imagery literary critics use to characterize texts – inside, outside, close, surfact, etc. – is vague, and can be characterized as the ground of intuitions for examining texts. It traps critical thought into a critical style that is moribund. The cognitive sciences provide a much richer repertoire of concepts for thinking about texts (I provide a brief tour), and the minds that read and create them. These concepts are grounded in computation. Moreover we need to development more effective means of describing texts and their formal features.

After the thrill is gone...A cognitive/computational understanding of the text, and how it motivates the description of literary form [Description!]: We review some seminal work in cognitive science, specifically, the speech understanding project funded by the Department of Defense in the mid-1970s, and show how that work can lead to and provide a ground for intuitions about texts and their formal features.

Appendix: Prospero Elaborated: This is an elaboration of the Prospero project mentioned above. Among other things, I imagine modeling different readers, and modeling the process of providing an explicit interpretation of a text.


[1] Machine translation, Wikipedia, Accessed September 10, 2017: https://en.wikipedia.org/wiki/Machine_translation#History

[2] History of artificial intelligence, Wikipedia, Accessed September 10, 2017: https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Dartmouth_Conference_1956:_the_birth_of_AI

[3] I’ve attached a somewhat elaborated version of Prospero as an appendix.

[4] Watson (computer), Wikipedia. Accessed September 10, 2017: https://en.wikipedia.org/wiki/Watson_(computer)#Jeopardy.21

[5] Actually, each particle is assigned six dimensions. I discuss this in somewhat more detail (including the concepts of entropy and phase space) in my working paper, A Primer on Self-Organization: With some tabletop physics you can do at home, February 2014, 13 pp.: https://www.academia.edu/6238739/A_Primer_on_Self-Organization

[6] Phase (matter), Wikipedia, accessed September 12, 2017: https://en.wikipedia.org/wiki/Phase_(matter)

[7] See, for example, this appreciation by Joel Frohlich, “Chaos, Meaning, and Rabbits: Remembering Walter J. Freeman III”, Knowing Neurons, Website, accessed September 12, 2017: http://knowingneurons.com/2016/06/15/chaos-meaning-rabbits/

[8] I once wrote a blog post entitled “I don’t give a crap about science”, New Savanna, accessed September 12, 2017: https://new-savanna.blogspot.com/2012/06/i-dont-give-crap-about-science.html

No comments:

Post a Comment