There is my recent working paper:
William Benzon, Toward a Theory of the Corpus, Working Paper, December 31, 2018, 46 pp. Academia, https://www.academia.edu/38066424/Toward_a_Theory_of_the_Corpus_Toward_a_Theory_of_the_Corpus_-by;
Calling my topic “theory of the corpus” has always seemed a bit strange to me, but I couldn’t quite think of anything better. Still can’t.
But I’m not interested in corpus linguistics in general, though some of my thoughts may germane, but in the context of literary analysis and description (aka literary criticism). In that context I’ve got three concerns:
1.) The relationship between digital criticism on the one hand and standard “close reading” for meaning on the other.
2.) The lack of interest in old-school computational linguistics based on symbolic computation rather than statistical techniques.
3.) The failure – though “failure” is probably not the right word – to realize just might be possible through analysis based on a corpus.
It seems to me that knowledge of #2 would help with #1 and #3.
Caveat: This is one of those posts where I’m basically thinking out loud.
The problem of meaning
More traditional literary critics see a real mismatch between digital or computational criticism and literary subject matter. In one form it’s mostly a desire to see how meaning fits into the general program (I’m thinking here of Alan Liu). In a somewhat less sympathetic form issue is akin to the objection Geoffrey Hartman registered against linguistics and “technical structuralism” in The Fate of Reading (1975). For many, though, I fear it is just flat-out antipathy toward anything the looks like math and science.
The digital critics see no problem, except that they are constantly facing this objection. They’re not interested in supplanting or replacing interpretive criticism. They’re just doing something different. That’s find, as far as it goes.
To some extent they see themselves as sociologists and their use of technical analytic machinery is much like sociologists use. Sociologists are interested in human social behavior at a variety of scales. They make observations, many of which can be quantified. And they uses statistics to analyze the data. Just tools. The tools are in one conceptual world while the phenomena under examination are in a different conceptual world. No problem. Boundaries and relationships are clear.
But texts as an object domain are somewhat different from social behavior as an object domain. When corpus techniques like topic analysis and vector semantics are being used they are, in some way, of a piece with the object domain. They are not so thoroughly outside that object domain. That becomes more apparent though knowledge of old-school computational linguistics (#2 above).
So, convention critics see computational criticism as being in an alien world, and that’s a problem. Computational critics don’t see a problem because the see literary texts as an object domain for the use of statistical tools, just like sociologists see social behavior. I’m saying something like: Not only is it NOT an alien domain (because language itself is a computational phenomenon), it’s the same domain; and because of that we can do even more.
That last paragraph is poorly put and I don’t know how to fix it. But the direction is good. Digital critics fail to see language itself as a computational phenomenon and that hampers the “depth” to which they’re able to employ their tools. It truncates their field of potential inference.
Closure (on mind)?
From my post of 4 January 2018, What is the mind (and what can we know of it)?
As for minds and brains, well, there we have my metaphor of the mind as neural weather. That harks back to Walter Freeman’s work on the complex dynamics of the nervous system. And, of course, Freeman talks the brain’s high-dimensional state space, which is some kind of relative of the high-dimensional semantic space that I talk about above. Just how close a relative, that’s hard to say. Each point in Freeman’s space is a state of the brain. Each point in a corpus model is a word meaning. Those are very different things. Though we might bring them together by thinking of each point in a corpus model as the target of focal attention.
That’s what I wrote when I first posted. But then I got to thinking about the work I’d done 15 years ago on attractor nets [1, 2], which I thought of as networks in which the nodes were basis of attraction in an assemblage of dynamical systems, such as the human brain.
In THAT context – which, alas, defies easy summary – it seemed to me that we could think of each word meaning (signified in Saussure’s terminology) as a basin of attraction. That would provide a way of linking a fundamentally dynamic account of the brain with a basically static model of semantic relationships. The network of relationships among the attractor basins is relatively stable and THAT’s what we can recover/approximate through statistical analysis of word distribution in texts. Words that co-occur do so because their meanings a coupled in a local attractor net.
 William Benzon, Attractor Nets, Series I: Notes Toward a New Theory of Mind, Logic and Dynamics in Relational Networks, Working Paper, 2011, 52 pp., Academia, https://www.academia.edu/9012847/Attractor_Nets_Series_I_Notes_Toward_a_New_Theory_of_Mind_Logic_and_Dynamics_in_Relational_Networks.
 William Benzon, Attractor Nets 2011: Diagrams for a New Theory of Mind, Working Paper, 55 pp., Academia, https://www.academia.edu/9012810/Attractor_Nets_2011_Diagrams_for_a_New_Theory_of_Mind.