Tuesday, August 15, 2017

In search of a small world net: Computing an emblem in Heart of Darkness [#DH]

This post is about aesthetics, one bit of Conrad’s craft. What’s the semantic ‘center’ of Heart of Darkness? I think Conrad has indicated that quite clearly and I’m wondering if it can be investigated computationally.

I have called paragraph 103 of Heart of Darkness the Nexus because in encapsulates the story of Kurtz, the central enigma of the story and one of two central characters–the other being Marlow, a boat captain and the main narrator. Some 300 words from the paragraph’s beginning we have the following sentence: “'My Intended, my ivory, my station, my river, my—' everything belonged to him.” [I've appended the opening to the end of this post.] That opening phrase is repeated later, in a slightly different form, in paragraph 148, while the steamer is on its return trip with Kurtz on board: “My Intended, my station, my career, my ideas—these were the subjects for the occasional utterances of elevated sentiments.” So, we have the two versions:
1) My Intended, my ivory, my station, my river, my—
2) My Intended, my station, my career, my ideas
We can consolidate the different terms into a single phrase:
3) My Intended, ivory, station, river, career, ideas
Let’s think of those terms in the context of Kurtz’s life. Briefly:
My Intended: the women he wishes to marry, but her relatives didn’t think him worthy of her because he was too poor.
ivory: The potential source of Kurtz’s wealth, produced by elephants in Africa.
station: Kurtz’s place of business, but also where he took an African mistress.
river: The Congo, connecting the station to the Atlantic Ocean and thereby to Europe.
career: Kurtz went into the ivory trade to make enough money to become worthy of his Intended.
ideas: His schemes for the betterment of the Congo, written up in a 17-page document ending with the phrase, alas, ‘Exterminate all the brutes!’
Now consider them as words, without any context. They cover a wide range of things:
My Intended: fiancé, not merely a woman, but a woman in a specific social relationship.
ivory: physical substance in solid form
station: geographic locus
river: geographical feature, liquid substance
career: from the dictionary, “an occupation undertaken for a significant period of a person's life”
ideas: immaterial, mental
My hunch is that Conrad’s phrase linking those words together is emblematic of Kurtz’s life and hence of the book. I want to make computational sense of that centrality, that emblematicity (if you will). Selecting those words and then linking them together into a single phrase, that is a product of Conrad’s craft, as is placing the first occurrence of that phrase at the structural center of the text and the second occurrence somewhat later.

With this in mind, imagine a high dimensional space in which all the word types in Heart of Darkness are located using some suitable technique, such as latent semantic analysis (LSA) or one of the more recent word embedding models. Locate these types in that space. Where are they in that space–in relation to the space as a whole, in relation to one another? Locate a position in that space that’s equidistant from those tokens. Where is the point in the whole space?

To investigate that I propose something like this: Create a graph where each word type is a node in the graph. Call it the text graph. Connect each node to every other node where the length of the edge connecting them two nodes is proportional to the Euclidean distance between them in the space. [Whoops! See note below.*] Note the terms in the litany: Intended, ivory, station, river, career, ideas. River should be further from career or ideas than it is from, say, stream, creek, lake, water, or ice. And ideas should be further from, say, ivory, that it is from thoughts, feelings, perceptions, or concepts. And so forth for however many examples you care to think about.

Now, let’s start deleting edges from the text graph, starting with the longest edge and moving to shorter edges until the terms in our emblematic phrase are no longer connected to one another. They no longer form a connected subgraph of the text graph. Let’s call the resulting graph the reduced text graph.

Though I don’t know this to be true, I am assuming that the reduced text graph is still connected. If it isn’t, well, given what I have in mind, that’s OK. But let’s assume that it is connected.

What’s the average length of shortest path between any two nodes? Now, let’s connect the words in our emblem together so that they form a connected graph. Let’s call the complete resulting graph the emblem-connected text graph. What’s the average length of shortest path between any two nodes in the emblem-connected text graph? Is that value small enough that we may consider the emblem-connected text graph to be a small-world network?

If so, then we have begun to make computational sense of one facet of Conrad’s craft in this text. One immediately wonders whether or not other texts have such emblems. I’m thinking, for example, that “A sunny pleasure-dome with caves of ice” is emblematic of Coleridge’s “Kubla Khan”. (My article, “Kubla Khan” and the Embodied Mind, explains why I conjecture that.) There must be other cases, lots of them I would think.

Virtual “Reading” – A crazy idea?

Consider the connected graph for the emblem phrase. It should be easy enough to calculate a central point for the phrase, no? But then, couldn’t we do that for any sentence or phrase? So, start at the beginning of the text and move sequentially through the text with a moving window of suitable length. Calculate the central point for the phrase within the window and trace the movement of successive centers through the text from beginning to end.

Such a “reading” would not, of course, yield the computer anything like an understanding of the text. That’s not why it interests me. I’m interested in the form the trajectory traces through the space. For example, how does it move with respect to the center of the emblem? What about the volume spanned by the subgraph within this moving window? How does it expand and contract. And so forth.

Appendix: The Opening of the Nexus

Here’s the opening of paragraph 103, The Nexus. The emblem occurs at the end of this section.
"I laid the ghost of his gifts at last with a lie," he began suddenly. "Girl! What? Did I mention a girl? Oh, she is out of it—completely. They—the women, I mean—are out of it—should be out of it. We must help them to stay in that beautiful world of their own, lest ours gets worse. Oh, she had to be out of it. You should have heard the disinterred body of Mr. Kurtz saying, 'My Intended.' You would have perceived directly then how completely she was out of it. And the lofty frontal bone of Mr. Kurtz! They say the hair goes on growing sometimes, but this—ah specimen, was impressively bald. The wilderness had patted him on the head, and, behold, it was like a ball—an ivory ball; it had caressed him, and—lo!—he had withered; it had taken him, loved him, embraced him, got into his veins, consumed his flesh, and sealed his soul to its own by the inconceivable ceremonies of some devilish initiation. He was its spoiled and pampered favorite. Ivory? I should think so. Heaps of it, stacks of it. The old mud shanty was bursting with it. You would think there was not a single tusk left either above or below the ground in the whole country. 'Mostly fossil,' the manager had remarked disparagingly. It was no more fossil than I am; but they call it fossil when it is dug up. It appears these niggers do bury the tusks sometimes—but evidently they couldn't bury this parcel deep enough to save the gifted Mr. Kurtz from his fate. We filled the steamboat with it, and had to pile a lot on the deck. Thus he could see and enjoy as long as he could see, because the appreciation of this favor had remained with him to the last. You should have heard him say, 'My ivory.' Oh yes, I heard him. 'My Intended, my ivory, my station, my river, my—' everything belonged to him.

* Reading an earlier version of this post, where I hadn’t mentioned LSA, Matt Jockers pointed out to me that I haven't indicated how this distance is to be calculated. That's why we have others look over our work. Still, there must be techniques for estimating distances between word types in texts. Hence, LSA. I’m assuming – as I can’t quite follow the technical details – that it produces the information we’d need to perform the necessary calculations of distance.

All I'm after is a way of creating a reduced graph where the edges connect words that are more closely related than those in the emblem phrase. I don't really need to have numeric values associated with edge length as long as I have reason to believe that the pairs connected in the reduced text graph really are more closely related than those that aren't connected. For example, we might be able to use Word Net to determine which words are closely related and connect only those in the reduced text graph.

  1. Working out the relationship between my- him. gives you the relationship between my-everything...

    I can't make the calculation, can come up with a range, but it looks the most difficult one reading it cold.