Wednesday, July 6, 2016

It’s not a question of manpower, a brief remark on computational criticism

One reason that has been given for computational criticism (aka ‘distant reading’) is that it is the only way we are going to examine books beyond the canon and that, until we do so, our understanding of literary history and partial and biased. Yes, it is likely true that this will be the only way we examine all those books that are no longer read, or even readily available. But this justification is misleading to the extent that it implies that the problem is one of manpower, as though we wouldn’t bother with computational criticism if the critical community had the time to read all those books. I submit that, on the contrary, even if we had the manpower, we’d still undertake computational analysis of 1000s of volumes at a time.

I have no idea of how many volumes would be in the complete corpus of English literature, does anyone? But for the sake of argument let’s pick a number, say 100,000, which is low even if we confine ourselves to work published before the 20th century. Imagine that we have a thousand critics available for reading, which implies that each of them will read 100 volumes. Assuming no other intellectual duties, they could do that reading in a year. Now, how do they make sense of their 10,000 books? As you think about that recall what a much larger number of critics has done for a much smaller number of volumes over the course of the last century.

What do we want from these critics? Well, topic analysis is a popular form of computational criticism, so why not do a manual version with all the (imaginary) critical manpower we’ve got available? Imagine that you are one of the 1000 critics. How will you undertake a topic analysis of your 100 volumes?

For that matter, how will you do it for the first volume? I suppose that you start reading and note down each topic as you come to it. When a topic recurs, you give it another check mark. How do you name the topics? How do you indicate what’s in them? You could list words, as is done in computational topic analysis. You could also use phrases and sentences. What happens as you continue reading? Perhaps what you thought was a horses topic at the beginning turns out to be, say, horse racing instead. So you’ve got to revise your topic list. I would imagine that maintaining coherence and consistency would be a problem as you read through your 100 volumes. Just think of all the scraps of paper and the computer files you’ll be working with. This is going to take a lot of time beyond that required for simply reading the books.

But it’s not just you. There are 1000 critics, each reading and analyzing 100 volumes. How do they maintain consistency among themselves? The mind boggles.

My point is that conducting a topic analysis of a corpus gives us a view of that corpus that we could not get with manual methods, with comparison and distillation of ‘close’ readings of 1000s of books by 1000s of readers. Topic analysis gives us something new. Whether this something is valuable, that’s a different question. But it’s not just a poor substitute for close readings that we’re never going to do.

The fact is, even if we had 1000 critics analyzing 100 volumes each, we’d probably conduct a topic analysis, and more, as a means of bringing some consistency to all those ‘manual’ analyses.

Topic analysis, of course, is not the only form of computational criticism. But the argument I’ve made using it as an example will apply across the board. We are doing something fundamentally new.

No comments:

Post a Comment