Saturday, June 23, 2012

Parts is Parts: What's a Sentence?

This is slightloy revised from the third part of a post I posted on The Valve in March 2010. The point is simple. Take a chunk of language (prose or poetry), remove all delimiters, punction and capitals, and what happens? It gets more difficult to read. What does that suggest to you about the working of the mechanisms underlying language?

* * * * *

Let's take a brief passage from David Patrick Columbia, the redoutable chronicler of New York City's social set. The subject matter is not difficult or esoteric, it's quite ordinary, still: What's he saying?
The courtly Mr. Ney is not a newcomer to Southampton in the 1990s he and his previous wife Judy they were divorced last year owned a house that had previously been owned by Anne McDonnell Ford Johnson also a Southamptonite and coincidentally a cousin of Pat Wood small worlds collide and happiness results congratulations to the happy couple
The passage has not been rendered unintelligible, but it’s a bit difficult to parse. In particular, there’s an interjection – “they were divorced last year” – that derails one’s parsing when you have no cue that it IS an interjection.

Without those simple little cues that delimit phrases, punctuation marks and capitalization, we’ve got to think explicitly about how to group the words into phrases. We can figure out what’s going on, but it takes some work. We have to try various things and see if they make sense. The fact is, many different things make sense, locally. The problem is to find a solution that makes global sense for the passage.

Now let's consider a literary text: Shakespeare’s Sonnet 129, in modernized spelling, but without punctuation, initial caps, or line breaks:
the expense of spirit in a waste of shame is lust in action and till action lust is perjured murderous blood full of blame savage extreme rude cruel not to trust enjoy'd no sooner but despised straight past reason hunted and no sooner had past reason hated as a swallow'd bait on purpose laid to make the taker mad mad in pursuit and in possession so had having and in quest to have extreme a bliss in proof and proved a very woe before a joy proposed behind a dream all this the world well knows yet none knows well to shun the heaven that leads men to this hell
What’s lost? It’s one thing to restore the ends of sentences and the phrasing within them. But what about the sonnet form? If you didn’t know that this was a sonnet, how would you figure out the line divisions or even know to look for them? How would you detect the rhymes? Could you even detect the meter?

How would you program a computer to detect the rhymes? To detect rhyming words you have to compare them and you have to compare, not their spelling, but their sound. Spelling gives only a partial clue to the sound. Here's a version in which I've indicated the rhymes, but nothing else:
the expense of spirit in a waste of shame is lust in action and till action lust is perjured murderous blood full of blame savage extreme rude cruel not to trust enjoy'd no sooner but despised straight past reason hunted and no sooner had past reason hated as a swallow'd bait on purpose laid to make the taker mad mad in pursuit and in possession so had having and in quest to have extreme a bliss in proof and proved a very woe before a joy proposed behind a dream all this the world well knows yet none knows well to shun the heaven that leads men to this hell
When pondering your algorithm for rhyme detection take a look at the end. Notice that the word well appears twice, but only one of those appearances rhymes with the final word, hell. How would your algorithm detect that? What does it need to know in order to know that one comparison is irrelevant?

We do such things all but effortlessly. If we're listening to a skilled reader recite the poem the sound shape of the speech stream clearly indicates the rhyme points (i.e. line ends) and makes them apparent. If we're reading from the page, the lineation takes care of that. But when you remove such cues . . .

How can so much be lost by changing so little?

* * * * *

Here's the Columbia passage with delimiters restored:
The courtly Mr. Ney is not a newcomer to Southampton. In the 1990s he and his previous wife Judy – they were divorced last year – owned a house that had previously been owned by Anne McDonnell Ford Johnson, also a Southamptonite and, coincidentally, a cousin of Pat Wood. Small worlds collide and happiness results. Congratulations to the happy couple!

No comments:

Post a Comment