From MIT Technology Review, 2015:
The easiest way to think about words and how they can be added and subtracted like vectors is with an example. The most famous is the following: king – man + woman = queen. In other words, adding the vectors associated with the words king and woman while subtracting man is equal to the vector associated with queen. This describes a gender relationship.
Another example is: paris – france + poland = warsaw. In this case, the vector difference between paris and france captures the concept of capital city.
Baldwin and co ask how reliable this approach can be and how far it can be taken. To do this, they compare how vector relationships change according to the corpus of words studied. For example, do the same vector relationships work in the corpus of words from Wikipedia as in the corpus of words from Google News or Reuters English newswire?
To find out, they look at the vectors associated with a number of well-known relationships between classes of words. These include the relationship between an entity and its parts, for example airplane and cockpit; an action and the object it involves, such as hunt and deer; a noun and its collective noun such ant and army. They also include a range of grammatical links—a noun and its plural, such as dog and dogs, a verb and its past tense, such as know and knew; and a verb and its third person plural such as accept and accepts.
The results make for interesting reading. Baldwin and co say that the vectors sums captured in these relationships generally form tight clusters in the vector spaces associated with each corpus.
However, there are some interesting outliers where words have more than one meaning and so have ambiguous representations in these vectors spaces. Examples in the third person plural cluster include study and studies, run and runs, increase and increases, all words that can be nouns and verbs, which distorts their vectors in these spaces.
And it works in machine translation, but not always:
It’s worth noting that one of the pioneers and driving forces in this field is Google and its machine translation team. These guys have found that a vector relationship that appear in English generally also works in Spanish, German, Vietnamese, and indeed all languages.
That’s how Google does its machine translation. Essentially, it considers a sentence equivalent in two languages if its position in the vector spaces of each is the same. By this approach its traditional meaning is almost irrelevant.
But because of the idiosyncratic nature of language, there are numerous exceptions. It is these that cause the problems for machine translation algorithms.
See Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin, Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning, arXiv:1509.01692v4 [cs.CL]:
Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relation types and different learning settings has not been evaluated. In this paper, we carry out such an evaluation in two learning settings: (1) spectral clustering to induce word relations, and (2) supervised learning to classify vector differences into relation types. We find that word embeddings capture a surprising amount of information, and that, under suitable supervised training, vector subtraction generalises well to a broad range of relations, including over unseen lexical items.
Vector summing is how expressive/ideophonic words seem to work in Santali (Munda, in India). So this is likely a very ancient way of processing information.
ReplyDelete