Monday, April 14, 2014

From Quantification to Patterns in Digital Criticism

I would like to continue the examination of fundamental presuppositions, conceptual matrices, which I began in The Fate of Reading and Theory. That post was concerned with how, in the context of academic literary criticism, 1) “reading” elides the distinction between (merely) reading some text – for enjoyment, edification, whatever – and writing up an interpretation of that text and 2) how “literary theory” became the use of theory in interpreting literary texts. This post is about the common sense association between computers and computing on the one hand and numbers and mathematics on the other.

* * * * *

Let’s start with a couple of sentences from one of the pamphlets published by Stanford’s Literary Lab, Ryan Heuser and Long Le-Khac, A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (May 2012, 68 page PDF):
The general methodological problem of the digital humanities can be bluntly stated: How do we get from numbers to meaning? The objects being tracked, the evidence collected, the ways they’re analyzed—all of these are quantitative. How to move from this kind of evidence and object to qualitative arguments and insights about humanistic subjects—culture, literature, art, etc.—is not clear.
There we have it, numbers on the one hand and meaning on the other. It’s presented is a gulf which the digital humanities must somehow cross.

When first read that pamphlet most likely I thought nothing of that statement. It states, after all, a commonplace notion. But when I read those words in the context of writing a post about Alan Liu’s essay, “The Meaning of the Digital Humanities” (PMLA 128, 2013, 409-423) I came up short. “That’s not quite right,” I said to myself, it’s wrong to so casually identify computers and computing with numbers.”

* * * * *

Now let’s take a look at an essay by Kari Krauss, Conjectural Criticism: Computing Past and Future Texts (DHQ: Digital Humanities Quarterly, 2009, Volume 3 Number 4). Here’s her opening paragraph:
In an essay published in the Blackwell Companion to Digital Literary Studies, Stephen Ramsay argues that efforts to legitimate humanities computing within the larger discipline of literature have met with resistance because well-meaning advocates have tried too hard to brand their work as "scientific," a word whose positivistic associations conflict with traditional humanistic values of ambiguity, open-endedness, and indeterminacy [Ramsay 2007]. If, as Ramsay notes, the computer is perceived primarily as an instrument for quantizing, verifying, counting, and measuring, then what purpose does it serve in those disciplines committed to a view of knowledge that admits of no incorrigible truth somehow insulated from subjective interpretation and imaginative intervention [Ramsay 2007, 479–482]?
Though I’ve got reservations about Ramsay’s proposals (see A Hothouse Manifesto: Does Stephen Ramsay Sell Literary Criticism Short?) I certainly have no problem with dissolving the bond that common sense has forged between the idea of the computer and those of numbers and math. Later on Krauss notes:
The essay develops a computational model of textuality, one that better supports conjectural reasoning, as a counterweight to the material model of textuality that now predominates. Computation is here broadly understood to mean the systematic manipulation of discrete units of information, which, in the case of language, entails the grammatical processing of strings[4] rather than the mathematical calculation of numbers to create puns, anagrams, word ladders, and other word games. The essay thus proposes that a textual scholar endeavoring to recover a prior version of a text, a diviner attempting to decipher an oracle by signs, and a poet exploiting the combinatorial play of language collectively draw on the same library of semiotic operations, which are amenable to algorithmic expression and simulation.
Here Krauss explicitly asserts that computers can operate on strings of linguistic characters as well as on numbers.

Her corrective, however, isn’t forceful enough. When Alan Turing formalized the concept of computation as the operation of an abstract machine – we now talk of Turing machines – he talked of that machine as reading symbols from and writing them to a paper tape according to a set of rules. He didn’t specify what those symbols meant or how, if at all, they were related to objects and events in the external world. His conception was very abstract and general: the machine processed symbols. That’s it.

Historically, the task of translating from one natural language to another is one of the problems on which the modern disciplines of computer science and engineering were founded. Research on machine translation began in the 1950s almost as soon as there were digital computers with the requisite capacity. That task is not about number crunching. It begins and ends in language.

* * * * *

Let's conclude with Franco Moretti, Network Theory, Plot Analysis (2011). I want to look at a long passage at the end where he starts out talking about quantification and ends up somewhere else (p. 11):
The idea behind this study, clearly stated in its opening page, was, very simply, that network theory could offer a way to quantify plot, thus providing an essential piece that was still missing from computational analyses of literature. Once I started working in earnest, though, I soon realized that the machine-gathering of the data, essential to large-scale quantification, was not yet a realistic possibility...So, from its very first section, the essay drifted from quantification to the qualitative analysis of plot: the advantage of thinking in terms of space rather than time; its segmentation into regions, instead of episodes; the new, non- anthropomorphic idea of the protagonist; or, even, the “undoing” of narrative structures occasioned by the removal of specific vertices in the network.

Looking back at the work done, I wouldn’t call this change of direction a mistake: after all, network theory does help us redefine some key aspects of the theory of plot, which is an important aspect of literary study. This is not the theory’s original aim, of course, but then again, a change of purpose – a “refunctionalization”, as the Russian Formalists called it – is often what happens to a system of thought traveling from one discipline to another....

No, I did not need network theory; but I probably needed networks.... What I took from network theory were less concepts than visualization: the possibility of extracting characters and interactions from a dramatic structure, and turning them into a set of signs that I could see at a glance, in a two-dimensional space.
Moretti started with quantification and ended with visualization, and visualization is ubiquitous in the analytical work of digital humanists. It’s necessary to get conceptual purchase on the data.

Here’s a pair of visualizations from Moretti’s most recent pamphlet, “Operationalizing”: or, the Function of Measurement in Modern Literary Theory (December 2013, p. 7).

Moretti Antigone

The top visualization is an ordinary bar chart. The length of a bar is proportional to the size of a character’s “word-space” in Antigone. While one could express this information verbally – Creon spoke 28.7% of the words; the chorus has 19.8%, etc. – that’s not a very good way of presenting the information.

The bottom visualization is even more resistant to verbal formulation. Sure, you could do it, but the pile of words would be big and it would be all but impossible to get the synoptic view you have in a single glance at the graph – as mathematicians call such network objects. The nodes in the graph represent characters in Antigone, the same characters as in the bar chart, while the links (also called edges or arcs) indicate relationship between the characters. the older pamphlet had over 50 such diagrams, though without the arrows and variable line weight. It’s those diagrams that Moretti discovered on the way to quantification.

What are those diagrams about? Let me suggest that they are about patterns. Yes, I know, the word is absurdly general, but hear me out.

That bar chart depicts a pattern of quantitative relationships, and does so better and more usefully than a bunch of verbal statements. You look at it and see the pattern.

The pattern in the network diagram is harder to characterize. It’s a pattern of relationship among characters. What kind of relationships? Dramatic relationships? That, I admit, is weak. But if you read Moretti’s pamphlet, you’ll see what’s going on.

The important point is what happens when you get such diagrams based on a bunch of different texts. You can see, at a glance, that there are different patterns in different texts. While each such diagram represents the reduction of a text to a model, the patterns in themselves are irreducible. They are a primary object of description and analysis.

And that is my point: patterns.

As far as I know Moretti did not use a computer to discover those network patterns. He used a computer to draw them, given human input, but he didn’t feed texts into a computer and it then “read” them and compiled the diagrams automatically. Moretti read the texts, identified the characters, and drew the diagrams by hand, which were then redrawn using computer tools.

It seems to me that the automatic generation of such diagrams from textual input may be within the capacity of current computing technology, but that’s beside the point. Those diagrams are very much in the spirit, if you will, of computing.

* * * * *

You might want to look at these posts:

No comments:

Post a Comment