Pattern-matching is much discussed these days in connection with deep learning in AI. Here's a post from July, 2014, where I discuss patterns more generally. [I was also counting down to my 2500th post. I'm now over 8600.]
* * * * *
When I posted From Quantification to Patterns in Digital Criticism I was thinking out loud. I’ve been thinking about patterns for years, and about pattern-matching as a computational process. I had this shoot-from-the-hip notion that patterns, as general as the concept is, deserve some kind of special standing in methodological thinking about so-called digital humanities – likely other things as well, but certainly digital humanities. And then I discovered that Rens Bod was thinking about patterns as well. And his thinking is independent of mine, as is Stephen Ramsay’s.
So now we have three independent lines of thought converging on the idea of patterns. Perhaps there’s something there.
But what? It’s not as though there’s anything new in the idea of patterns. It’s a perfectly ordinary idea. THAT’s not a disqualification, but I think we need something more if we want to use the idea of pattern as a fundamental epistemological concept
From Niche to Pattern
In my previous patterns post, “Pattern” as a Term of Art, I argued that the biological niche is a pattern in the sense we need. It’s a pattern that arises between a species and its sustaining environment. Organisms define niches. While biologists sometimes talk of niches pre-existing the organisms that come to occupy them, that is just a rhetorical convenience.
That example is important because it puts patterns “out there” in the world rather than them being something that humans (only) perceive in the world. But now it’s the human case that interests me, patterns that humans do see in the world. But we don’t necessarily regard all the patterns we see as being “real”, that is, as existing independently of our perception.
When we look at a cloud and see an elephant we don’t conclude that an elephant is up there in the sky, or that the cloud decided to take on an elephant-like form. We know that the cloud has its own dynamics, whatever they might be, and we realize that the elephant form is something we are projecting onto the world.
But that is something we learn. It’s not given in the perception itself. And that learning is guided by cultural conventions.
We see all kinds of things in the world. Not only does the mind perceive patterns, it seeks them out. What happens when we start to interact with the phenomena we perceive? That’s when we learn whether or not the elephant we saw is real or a projection.
With this in mind, consider this provisional formulation:
An observer defines a pattern over objects.
The parallel formulation for ecological niche would be:
A species defines a niche over the environment.
The pattern, the niche, exists in the relationship between a supporting matrix (the environment, an array of objects) and the organizing vehicle (the species, the observer). Just as there’s no way of identifying an ecological niche independently of specifying an organism occupying the niche, so there’s no way of specifying a (perceptual or cognitive) pattern independently of specifying a mind the charts the pattern.
As a practical matter, of course, we often talk of patterns simply as being there, in the world, in the data. And our ability to understand how the mind captures patterns is still somewhat limited. But if we want to understand how patterns function as epistemological primitives, then we must somehow take the perceiving mind into account.
The point of this formulation is to finesse the question of just what characteristic of some collection of objects makes them a suitable candidate for bearing a pattern. We can understand how patterns function as epistemological primitives without having to specify, as part of our inquiry, what characteristics an ensemble must have to warrant treatment as a pattern. We as epistemologists are not in the business of making that determination. That’s the job of a perceptual-cognitive system.
Our job is to understand how such systems come to accept some patterns as real while rejecting others. How does that happen? Through interaction, and the nature of that interaction is specific to the patterns involved.
Two Simple Examples: Animals and Stars
Let us consider some simple examples. Consider the patterns a hunter must use to track an animal, footprints, disturbed vegetation, sounds of animal movement, and so forth. The causal relationship between the animal and the signs in the pattern is obvious enough; the signs are produced by animal motion. The hunter knows that the pattern is real when the animal is spotted. Of course, the animal may not always be spotted, yet the pattern is real. In the case of failure the hunter must make a judgment about whether the pattern was real, but the animal simply got away, or whether the perceived pattern was simply mistaken.
Constellations of stars in the sky are a somewhat more complex example. That a certain group of stars is seen as Ursa Major, or the Big Dipper, is certainly a projection of the human mind onto the sky. The set of stars in a given constellation do not form a group organized by internal causal forces in the way that a planetary system does. The planets in such a system are held there by mutual gravitational attraction. The gravitational force of the central star would be the largest component in the field, with the planets exerting lesser force in the system.
But the stars in the Big Dipper are not held in that pattern by their mutual gravitational forces. Whatever that pattern is, it is not evidence of a local gravitational system among the constituents of the pattern. Rather, that pattern depends on the relationship between the observer and those objects. An observer at a different place in the universe, near one of the stars, for example, wouldn’t be able to perceive that pattern. And yet the stars have the same positions relative to one another and to the rest of the (nearby) universe.
Our knowledge of constellations is quite different from the hunter’s knowledge of tracking lore. One cannot interact with constellations in the way one interacts with animals. While one can pursue and capture or kill animals, one can’t do anything to constellations. They are beyond our reach. But we can observe them and note their positions in the sky. And we can use them to orient ourselves in the world and thus discover that they serve as reliable indicators of our position in space.
These two patterns attain reality in a different way. The forces that make the animal’s trail a real pattern are local ones having to do with the interaction between the animal and its immediate surrounding. The forces “behind” the constellations are those of the large-scale dynamics of the universe as “projected” onto the point from which the pattern is viewed.
These two patterns attain reality in a different way. The forces that make the animal’s trail a real pattern are local ones having to do with the interaction between the animal and its immediate surrounding. The forces “behind” the constellations are those of the large-scale dynamics of the universe as “projected” onto the point from which the pattern is viewed.
A Case from the Humanities
Now let’s consider an example that’s closer to the digital humanities. Look at the following figure:
The red triangle is the pattern and I am defining it over the vertical bars. That is, I examined the bars and decided that they’re approximating a triangle, which I then superimposed on those bars. The bars preexisted the triangle.
I also created those bars, but through a process that is different and separate from that from the informal and intuitive process through which I created the triangular pattern. Each bar represents a paragraph in Joseph Conrad’s Heart of Darkness; the length of the bar is proportional to the number of words in the paragraph. The leftmost bar represents the first paragraph in the text while the rightmost bar represents that last paragraph in the text. The other bars represent the other paragraphs, in textual order from left to right.
The bars vary quite a bit in length. The shortest paragraph in the text is only two words long while the longest is, I believe, 1502 words long. In any given run of, say, twenty paragraphs, paragraph lengths vary considerably, though there isn’t a single paragraph over 200 words long in the final 30 paragraphs or so.
But why, when the distribution of paragraph lengths is so irregular, am I asserting the overall distribution has the form of a triangle? What I’m asserting is that that is the envelope of the distribution. There are a few paragraphs outside the envelope, but great majority are inside it.
The significant point, though, is that there is one longest paragraph and it is more or less in the middle. That paragraph is considerably longer (by over 300 words) than the next longest paragraphs, which are relatively close to it. The paragraphs toward the beginning and the end, the end especially, tend to be short.
What we’d like to know, though, is whether this distribution is an accident, and so of little interest, or whether it is an indicator of a real process. In the first place I observe that, in my experience, paragraphs over 500 words long are relatively rare – this is the kind of thing that can be easily checked with the large text databases we now have. Single paragraphs of over 1000 words must be very rare indeed.
And that longest pattern is quite special. It is very strongly marked. If you know Conrad’s story, then you know it centers on two men, Kurtz, a trader in the Congo, and Marlow, the captain of a boat sent to retrieve him. Marlow narrates the story, but it isn’t until we’re well into the story that Kurtz is even mentioned. And then we don’t learn much about him, just that he’s a trader deep in the interior and he hasn’t been heard from in a long time.
That longest paragraph is the first time we learn much about Kurtz. It’s a précis of his story. The circumstances in which Marlow gives us this précis are extraordinary.
His narrative technique is simple; he tells events in the order in which they happened – his need for a job, how he got that particular job, his arrival at the mouth of the Congo River, and so forth. With that longest paragraph, however, Marlow deviates from chronological order.
He introduces this information about Kurtz as a digression from the story of his journey up the Congo River to Kurtz’s trading station. Some of what he tells us about Kurtz happened long before Marlow set sail; and some of what we learned happened after the point in Marlow’s journey where he introduces this paragraph as a digression.
What brought on this digression? Well, Marlow’s boat was about a day’s journey from Kurtz’s camp when they were attacked from the shore. The helmsman was speared through the chest and fell bleeding to the deck. It’s at THAT point that Marlow interrupts his narrative to tell us about Kurtz – whom he had yet to meet. Once he finishes this most important digression he returns to his bleeding helmsman and throws him overboard, dead. Just before he does so he tells us that he doesn’t think Kurtz’s life was worth that of the helmsman who died trying to retrieve him.
That paragraph – its length, content, and position in the text – is no accident. That statement, of course, is a judgement, only based only on my experience and knowledge as a critic, which have been shaped by the discipline of academic literary criticism. But it’s not an unreasonable judgement; it is of a piece with the thousands of such judgements woven into the fabric of our discipline.
Conrad may not have consciously planned to convey that information in the longest paragraph in the text, and to position that paragraph in the middle of his text, but whatever unconscious cognitive and affective considerations were driving his craft, they put that information in that place in the text and at that length. The apex of that triangle is real, not merely in the sense that the paragraph is that long, but in the deeper sense that it is a clue about the psychodynamic forces shaping the text.
Just what are those psychodynamic forces? I don’t know. The hunter can tell us in great detail about how the animal left traces of its movement over the land. Astronomers and astrophysicists can tell us about constellations in great detail. But the pattern of paragraph lengths in Heart of Darkness is a mystery.
* * * * *
Why do I consider this example at such length? For one thing, I’m interested in texts. Patterns in text are thus what most interest me.
Secondly, that example makes the point that description is one thing, explanation another. I’ve described the pattern, but I’ve not explained it. Nor do I have any clear idea of how to go about explaining it.
There’s a lot of that going around in the digital humanities. Patterns have been found, but we don’t know how to explain them. We may not even know whether or not the pattern reflects something “real” about the world or is simply an artifact of data processing.
Third, whereas much of the work in digital humanities involves data mining procedures that are difficult to understand, this is not like that. Counting the number of words in a paragraph is simple and straightforward, if tedious (even with some crude computational help). And yet the result is strange and a bit mysterious. Who’d have thought?
Note that I distinguish between the bar chart that displays the word counts and the pattern I, as analyst, impose on it. When I say that the envelope of paragraph length distribution is triangular, I’m making a judgement. That judgement didn’t come out of the word count itself. And when I say that that pattern is real, I’m also making a judgement, one that I’ve justified – if only partially – by discussing what happens in that longest paragraph and that paragraph's position in the text as a whole.
My sense of these matters is that, going forward, we’re going to have to get comfortable with identifying patterns we don’t know how to explain. We need to start thinking about, theorizing if you will, what patterns are and how to identify them.
* * * * *
I’ve written a good many posts on Heart of Darkness. I discuss paragraph length HERE and HERE. I’ve called that central sentence the nexus and discuss it HERE. Here’s a downloadable working paper that covers these and other aspects of the text.
* * * * *
I’m on a countdown to my 2500th post. This is number 2495.
No comments:
Post a Comment