Sunday, July 29, 2018

Virtual Reading as a path through a high-dimensional semantic space [#DH]

Over the past year of so I’ve been thinking about computing virtual readings of texts where the reading is in effect a path through a high-dimensional sematic space. I’ve even written a working paper about it, Virtual Reading: The Prospero Project Redux. I’ve just now discovered that Andrew Piper has taken steps in that direction, though not in those terms. The paper is:
Andrew Piper, Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel, New Literary History, Volume 46, Number 1, Winter 2015, pp. 63-98. DOI: https://doi.org/10.1353/nlh.2015.0008
Piper is interested in conversion stories in autobiographies and novels. He’s also interested in making a methodological point about an exploratory style of investigation that moves back and forth between qualitative and quantitative forms of reasoning. That’s an interesting and important point, very important, but let’s set it aside. I’m interested in those conversion stories.

He takes Augustine’s Confessions as his starting point. Augustine puts the story of his conversion near the end of Book 8, of thirteen. Do the chapters prior to the conversion take place in a different region of semantic space from those after the conversion? With chapters (Augustine calls them books) as analytic unit, Piper uses multidimensional scaling (MDS) to find out. I’ve taken his Figure 2 and added the shading (p. 71):

Augustine 1

We can see that books 1-10 occupy a position in semantic space that’s distinctly different from books 11-13 and, further more, that “The later books are not just further away from the earlier books, they are further away from each other” (p. 72). Now, though Piper himself doesn’t quite do so, it is easy enough to imagine each book as a point in a path, and track that path through the space.

But Piper does note that “Book 13 appears to return back to the original ten books in more circular fashion” (p. 71) and that, by a standard statistical measure, Book 13 belongs with 1-10. In the following figure I’ve highlighted both the relationship between Books 1 and 13 and the long leap between 10 and 11:

Augustine 2

From there Piper goes on to develop a pair of numerical measures that allows him to test a body of texts (450 of them) for this kind of semantic structure. Which is all well and good.

But I want to dwell on the mere fact that Piper has, in effect, shown that Augustine’s course through the Confessions moves through distinct regions of semantic space and that we have statistical tools we can use to trace such paths. Now, Piper used the chapter as his unit of analysis. That’s a rather large unit, yielding a rather crude approximation of the reader’s path. Hey, Jake. How do you drive from New York to LA? Well, I generally go to Pittsburgh, then St. Louis, Denver and then LA. Well, yeah, sure. That’s the general idea. But as driving instructions, it’s not very useful.

What if Piper had used 1000-word chunks, or paragraphs, perhaps even sentences, as a unit of analysis? I surely don’t know. It’s quite possible that, for his purposes, the added detail would have little or no value. But it might be that a more find-grained analysis would show that there are places in 1-10 territory where the path stray over there into 11-13 territory. And if so, what then? Who knows? We won’t know until we take a look.

I note finally that this is a kind of formal analysis. When Piper looks for similar texts, he's look for geometrical congruence that is independent of the specific semantic values of the underlying space.

For more thoughts in this, check out my working paper, Virtual Reading: The Prospero Project Redux, where I discuss Heart of Darkness and Shakespeare.

No comments:

Post a Comment