This case involves aspects of the first two cases we’ve considered. Like the analysis of Jaws, it began with an open-ended examination of a particular text, Joseph Conrad’s Heart of Darkness. That open-ended process leads to a specific moment that is like the beginning of my hunt for the history of the Xanadu meme. In this case I noticed that, at a certain point, the narrative gets ahead of itself. I then noticed that the paragraph in which that occurred was extraordinarily long. “I wonder,” thought I to myself, “if that’s the longest paragraph in the text?” It didn’t seem like a very interesting or important question. So events get a bit out of order in the paragraph, so what? The opportunity cost of answering the question was low, however, so I set out to answer it. When I charted the distribution of paragraph lengths in text order I saw a surprising pattern. That changed the way I thought about the text.
This case brings up another issue as well, one that didn’t exist in the other two: Is it real? I discovered a pattern of words in Heart of Darkness that is of a kind not recognized by literary scholars. What follows from that? What kinds of questions does it raise and how are they to be answered?
Story and plot, Center-point construction
I think the thing to do is to run through the decisions I made and why I made them.
1. Read Heart of Darkness
I had finished a series of posts about Apocalypse Now and decided that it was time to read Joseph Conrad’s Heart of Darkness. Why? Because the movie was loosely passed on Conrad’s book, which I’d never read. So I took it from the library and started reading it. At some point I decided I might actually want to work on the text so I downloaded it from Project Gutenberg. Why? Mostly so I could annotate it. At some point I noticed that temporal anomaly I mentioned above.
I mentioned that anomaly in the first post I did on the book, Heart of Darkness: Narration and Temporal Displacement (July 10, 2011).
2. Examine a temporal anomaly
As you know, Heart of Darkness is a story about a pilot named Marlow taking a boat up the Congo River to a trading station whose proprietor, a man named “Kurtz,” had gone silent. I was at a point in the story when we had not yet reached that trading station. All of a sudden I found myself reading about something that obviously had happened at the station. “That’s odd,” I thought. Writer’s sometimes do that, I know, and one is supposed to notice it. I fished around in the text to verify that, yes, we’d skipped ahead in time, and in this process noticed that this particular paragraph, which was a precis of Kurtz’s life, seemed extraordinarily long. Kurtz is one of two central characters (Marlow is the other); This paragraph is the first time we learned much about him.
By this time I had either started blogging about the book – something I do, blog about books in the process of reading – or I had decided to do so. I thought that, in the process of writing about this paragraph I would remark on the length of the paragraph, suggesting that it was possibly the longest paragraph in the text.
3. Check the length of that paragraph
At that point I realized that that’s something I could check easily enough. As I had an electronic text, I wouldn’t have to count the words by hand. Rather, I could enlist the help of Microsoft Word, which has a function that counts the number of words in a chunk of text that one has selected. Sure, I’d have to do that for every paragraph in the text, and I’d have to number the paragraphs to keep track, but that’s easy to do, though tedious. So I did it. One by one I counted the number of words in each paragraph and entered the total into a spread sheet. When I was done verifying that that particular paragraph, number 103 in the Gutenberg text, was easy; just sort the spreadsheet entries in order. Once that was done paragraph 103 was at the top of the list, with 1531 words.
I mention that in my third post, Closure, Attachment, and Abstract Objects in Heart of Darkness (July 12, 2011), where I also give the length of the 2nd, 3rd, and 4th longest (1129, 1103, and 865 words respectively). Otherwise I nothing about paragraph lengths.
On July 16 I did a post in which I asserted that paragraph 103, which I was now calling the nexus, was the structural center of the text: The Heart of Heart of Darkness. That suggested ring-composition, which interests me a great deal. Ring-composition foregrounds the structural center of the text. As I was not sure that Heart exhibited all the characteristics I coined the term “center point construction,” after the characteristic it did exhibit. [For what it’s worth, the late Mary Douglas brought ring-composition to my attention. Her last book was about the subject: Thinking in Circles (Yale 2007).]
In that post I also pointed out that, not only was the nexus paragraph the structural center, but that it was very strongly marked. At the point in the story Marlow’s boat had been attacked by natives on the shore and the helmsman was speared in the chest and fell bleeding on the deck. At that point Marlow breaks off from the story and inserts the nexus, which gives us an overview of Kurtz’s life, into his narration. When he’s done with the nexus he returns to the main narrative at the point where then helmsman is blooding out on the deck. Marlow tosses the helmsman overboard. This paragraph, where for the first time we learn who Kurtz was and what he did, is flanked by the death the helmsman.
A day later, July 17, I made the following two charts in Excell. Each bar stands for a paragraph in the text. The length of the bar is proportional to the number of words in the paragraph. In this chart the paragraphs are ordered according to their length:

I had no expectations about the shape of the plot when I made the chart. But that distribution appears to be lawful. What do we know about the distribution of paragraph lengths in texts?
In this chart the paragraphs are listed in their order in the text:

That distribution is very spikey. The extreme length (1531 words) of the nexus paragraph, 103, appears quite remarkable in this chart, and does the lengths of the next two longest paragraphs, 1129 and 1103 words.
4. Get others involved
By now I was actively investigating paragraph length and its implications for the formal structure of Heart of Darkness. For some reason, I don’t recall why, I wondered whether or not the distribution of paragraph lengths was a power law distribution. So I wrote to Cosma Shalizi and asked him to check it for me. Those charts (Figures 1 and 2) show up in a post from July 18, 2011: Digital Humanities Sandbox Goes to the Congo.
I also sent a link to that post to Mark Liberman, who runs a group blog on language and linguistics, Language Log, hoping he would post notice of it there. I wanted to find out whether anyone knew anything about the distribution of paragraph lengths in texts and figured that the readers of Language Log might know. He obliged me that day with a post, Markov’s Heart of Darkness, which elicited a lively discussion, but alas, no further information about paragraph lengths, though Liberman posted some work he was prompted to do on some other texts, including Conrad’s Nostromo, and Henry James, The Golden Bowl.
* * * * *
I’ve continued to think about this business of paragraph length, but want to set that aside. Let’s take up the question: Would an AI have undertaken the investigation that I’ve outlined so far?
What would an AI do?
I want to review those four steps, pointing out just why I took them and asking whether or not a current AI would be likely to have taken them.
1. Read Heart of Darkness
As I’ve said, I did this mostly out of curiosity, motivated by my work on Apocalypse Now. There’s nothing curious about the link between the movie and the text; that’s part of the public record and is well known. Let’s set aside the fact that no current AI would be able to view a movie and comprehend what’s taking place at even the most superficial level. Given that it had the capability, why would an AI watch a movie unless directed to do so? Of course, during training, an AI would be “fed” lots of movies. But why, once trained, would an AI watch a movie unless directed to do so by a user? And, having watched Apocalypse Now, why would that AI then go on to read Heart of Darkness? Current AI’s lack curiosity. It’s not at all obvious how we’ll endow future ones with curiosity.
2. Examine a temporal anomaly
Why did I notice that temporal anomaly? More than that, why did I verify it, investigate it, and deliberately register and think about the content of the paragraph in which it occurred? Here I’m curious about how “ordinary” readers experienced that anomaly. It doesn’t make the story difficult to follow. We know that we’re in a digression from the main story and can assume that we’ll get back. Why did I make a big deal of it?
Because I’m trained as a literary critic and I have a particular interest in structural issues. Thus when I studied Tristram Shandy as an undergraduate I learned about the Russian Formalist critics, who made a distinction between story (or fabula in Russian) and plot (or syuzhet). As you know, in Tristram Shandy the plot scrambles the story in an extreme way. Later when I wrote my doctoral dissertation, “Cognitive Science and Literary Theory,” I made a point of talking about the computational issues involved in decoupling plot and story. So, I paid attention to this issue because I have a particular interest in such matters.
At this point I’m moving beyond inquiry motivated by open-ended curiosity. I have a reason, temporal anomaly, to be curious about that paragraph.
But why would an AI even notice it unless specifically directed to do so? That’s not at all obvious to me.
3. Check the length of that paragraph
But why was I curious about the length of that paragraph? It’s a long paragraph. So what? For whatever reason, paragraph length is something I’ve paid attention to ever since reading continental philosophy as an undergraduate. For example, I remember thinking to myself that Merleau-Ponty’s Phenomenology of Perception had many very long paragraphs. And I’ve always paid attention to the length of paragraphs in my own writing.
Still, it’s one thing to notice that the paragraph was a long one. It’s something else to actually verify that it’s the longest one in the text. If I’d had to do the count by hand, I certainly wouldn’t have done it. Still, it did take me two hours or so. Why? Nothing beyond curiosity. I didn’t suspect that anything would come of it. It wasn’t until I saw that chart depicting paragraph lengths in order in the text (Fig. 2) that I paid attention. Not only is the distribution obviously very spiky, but it also appears to be roughly pyramidal:

Our nexus paragraph is at the center of that shape. That’s when I began to think about ring-composition, which is something that was of deepest interest to me at the time. Now I was seriously interested.
But why would an AI even take note of paragraph length? It would be one thing if literary criticism routinely examined paragraph length. But it doesn’t. And this was the first time I’d done it myself.
Assuming that the AI somehow noticed that that paragraph was long, and then went to the trouble to make a chart, or whatever, why would it then connect that chart with ring-composition? Ring-composition is about what happens in a text, not how long the paragraphs are?
On the whole, it’s not at all obvious to me how a current AI would get from “reading” through Heart of Darkness to examining paragraph length and investigating the possibility of ring-composition in the text. The route I took is diffuse and indirect. I have a lot of experience examining literary texts. I’m used to playing hunches on things I notice. Some hunches pan out, some don’t. How do we train an AI to have hunches and follow (at least some of) them out?
4. Get others involved
Once I’d noticed that the distribution of paragraph lengths appeared to be lawful (Fig. 1), it was natural for me to wonder whether or not anything is known about paragraph lengths in texts. But just why would I be curious about that? Well, by that time I’d been studying language in one way or another for four decades: word meaning, language in sentences, language in texts, literary language. Words get counted for this that and the other reason, and we have things like Zipf’s law, which is about word frequency in a corpus of texts. So, for all I knew, someone had for some reason gotten curious about paragraph lengths in texts, maybe they’d been curious about historical trends in paragraph lengths over the course of decades or even centuries. Who knows?
That’s when I turned to Mark Liberman and Language Log to see if anyone knew anything. They didn’t. It is of course quite possible to someone has done some work, but that community just didn’t know it.
The question I’m asking here is: Why would an AI be curious about such a thing? That is, given that an AI had somehow gone through the steps I’d gone through and was “staring” at a chart showing the distribution of paragraph lengths in Heart of Darkness (Fig. 1) why would they become curious about the general question of paragraph lengths in texts? They reason I gave for doing so was rather diffuse. On the other hand, if we’re willing to grant that an AI would have the kind of curiosity that got it this far, then it’s not to much to grant it the curiosity to inquiry about the general question.
And if it does that, then its could set out to answer the question directly. It could, for example, got to Project Gutenberg, pick a sample of texts, and count paragraph lengths.
However, I’m finding it difficult to imagine that any AI would have the kind of curiosity required to get this far. While the various things I did may seem obvious in retrospect, that’s not how they actually unfolded. What actually happened? I played a bunch of hunches loosely based on my decades of experience in working with and thinking about literary texts and their structure. Without comparable experience, there’s no reason to do any of it.
Is this real? How do we know?
That reality is socially constructed can reasonably be construed as an uninteresting truism. How else could it be? But when we’re talking about intellectual disciplines that proposition is more substantial. Back in 1996 journalist John Horgan published a book with the provocative title, The End of Science: Facing the Limits of Science in the Twilight of the Scientific Age. He argued that a number of scientific fields have gone as far as they can go. There’s nothing left.
Setting aside the general question, which I address in a working paper based on the 2015 edition, the foundations of physics is one of the disciplines he examines. The problem there is simple, we have a proliferations of theoretical models, but no empirical evidence. What gives? In her (rather controversial) book, Lost in Math: How Beauty Leads Physics Astray, Sabine Hossenfelder has argued that the academic community of theoretical physicists has become committed to a certain way of doing things and refuses to change. That’s how academic disciplines are.
But we’re not talking about physics here, nor science at all. We’re talking about literary criticism. There’s no doubt that the pattern of paragraph lengths I’ve discovered in Heart of Darkness is real. But that’s not the kind of thing literary critics are used to taking into evidence. It has no evidentiary value in literary criticism. As for ring composition or ring-form, that is recognized by some scholars, mostly classicists and Biblical scholars, but not by most literary critics. One scholar, the late James J. Paxson, even ridiculed it in an article published in an important journal, Style, in 2001, referring to students of ring composition as “ringers.” How, then, is one to change the discipline(s) of literary criticism so that paragraph length is accorded evidentiary value and ring-composition is recognized as a legitimate phenomenon?
I have no intention of suggesting an answer to those questions beyond noting that academic disciplines do change. My point here is that the questions are real, and are “orthogonal” to the use of AI in intellectual work. I did not in fact use AI in my work on Heart of Darkness, but if I were to come up with some evidence created through AI, would that evidence thereby have any special value? Should it? Would an AI be able to recognize that something it has managed to come up with was of a new kind and so required social negotiation? Would it then know how to engage in such negotiations?
Who knows?
No comments:
Post a Comment