Sunday, August 19, 2018

Computation, Semantics, and Meaning: Adding to an Argument by Michael Gavin [#DH]

I've completed a new working paper. Title above. You can download it here:
Abstract, table of contents, and introduction below.

network distribution 2

Abstract: Michael Gavin has published an article in which he uses ambiguity as a theme for juxtaposing close reading, a standard procedure in literary criticism, with vector semantics, a newer technique in statistical semantics with a lineage that includes machine translation. I take that essay as a framework of adding computational semantics to the comparison, which also derives from machine translation. After recounting and adding to Gavin’s account of 18 lines from Paradise Lost I use computational semantics to examine Shakespeare “The expense of spirit”. Close reading deals in meaning, which is ontologically subjective (in Searle’s usage) while both vector semantics and computational semantics are ontologically objective (though not necessarily objectively true, a matter of epistemology).

Introduction: I heard it in the Twitterverse 3
Warren Weaver, “Translation”, 1949 4
Multiple meaning 5
The distributional hypothesis 6
MT and Computational semantics 8
A connection between Weaver 1949 and semantic nets 9
Abstraction and topic analysis 12
Two kinds of computational semantics 13
The meanings of words are intimately interlinked 14
Concepts are separable from words 14
Models and graphics (a new ontology of the text?) 16
Comments on a passage from “Paradise Lost” 17
Some computational semantics for a Shakespeare sonnet 21
Three modes of literary investigation and two binary distinctions 25
Semantics and meaning 25
Two kinds of semantic model 26
Where are we? All roads lead to Rome 27
Appendix 1: Virtual Reading as a path through a high-dimensional semantic space 28
Appendix 2: The subjective nature of meaning 30
Walter Freeman’s neuroscience of meaning 30
From Word Space to World Spirit? 32

Introduction: I heard it in the Twitterverse

Michael Gavin recently published a fascinating article in Critical Inquiry, Vector Semantics, William Empson, and the Study of Ambiguity , one that has developed some buzz in the Twitterverse. I liked the article a lot. But I was thrown off balance by two things, his use of the term “computational semantics” and a hole in his account of machine translation (MT). The first problem is easily remedied by using a different term, “statistical semantics”. The second could probably be dealt with by the addition of a paragraph or two in which he points out that, while early work on MT failed and so was defunded, it did lead to work in computational semantics of a kind that’s quite different from statistical semantics, work that’s been quite influential in a variety of ways.

In terms of Gavin’s immediate purpose in his article, however, those are minor issues. But in a larger scope, things are different. And that is why I’d composed the posts I’ve gathered into the working paper. Digital humanists need to be aware of and in dialog with that other kind of computational semantics. Gavin’s article provides a useful framework for doing that.
Caveat: In this working paper I’m not going to attempt to explain Gavin’s statistical semantics from the ground up. He’s already done that. I assume a reader who is comfortable with such material.
Warren Weaver, “Translation”, 1949

Let us start with a famous memo Warren Weaver wrote in 1949. Weaver was director of the Natural Sciences division of the Rockefeller Foundation from 1932 to 1955. He collaborated Claude Shannon in the publication of a book which popularized Shannon’s seminal work in information theory, The Mathematical Theory of Communication. Weaver’s 1949 memorandum, simply entitled “Translation” , is regarded as the catalytic document in the origin of machine translation (MT).

He opens the memo with two paragraphs entitled “Preliminary Remarks” (p. 1).
There is no need to do more than mention the obvious fact that a multiplicity of language impedes cultural interchange between the peoples of the earth, and is a serious deterrent to international understanding. The present memorandum, assuming the validity and importance of this fact, contains some comments and suggestions bearing on the possibility of contributing at least something to the solution of the world-wide translation problem through the use of electronic computers of great capacity, flexibility, and speed.

The suggestions of this memorandum will surely be incomplete and naïve, and may well be patently silly to an expert in the field - for the author is certainly not such.
But then there were no experts in the field, were there? Weaver was attempting to conjure a field of investigation out of nothing.

I think it important to note, moreover, that language is one of the seminal fields of inquiry for computer science. Yes, the Defense Department was interested in artillery tables and atomic explosions, and, somewhat earlier, the Census Bureau funded Herman Hollerith in the development of machines for data tabulation, but language study was important too.

A bit later in his memo Weaver quotes from a 1947 letter he wrote to Norbert Weiner, a mathematician perhaps best known for his work in cybernetics (p. 4):
When I look at an article in Russian, I say "This is really written in English, but it has been coded in some strange symbols.
 I will now proceed to decode."
Weiner didn’t think much of the idea. Yet, as Gavins explains, crude though it was, that idea was the beginning of MT.

The code-breaker assumes the message is in a language they understand but that it has been disguised by a procedure that scrambles and transforms the expression of that message. Once you’ve broken the code you can read and understand the message. Human translators, however, don’t work that way. They read and understand the message in the source language–Russian, for example–and then re-express the message in the target language–perhaps English.

Toward the end of his memo Weaver remarks (p. 11):
Think, by analogy, of individuals living in a series of tall 
closed towers, all erected over a common foundation. When they try to communicate with one another they shout back and forth, each from his own closed tower. It is difficult to make the sound penetrate even the nearest towers, and communication proceeds very poorly indeed. But when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers.

Thus may it be true that the way to translate from Chinese to Arabic,
or from Russian to Portuguese, is not to attempt the direct route, shouting 
from tower to tower. Perhaps the way is to descend, from each language, down
 to the common base of human communication - the real but as yet undiscovered universal language - and then re-emerge by whatever particular route is convenient.
The story of MT is, in effect, one in which researchers find themselves forced to reverse engineer the entire tower in computational terms, all the way down to the basement where we find, not a universal language, but a semantics constructed from and over perception and cognition.

No comments:

Post a Comment