Sunday, April 23, 2017

Google Translate and the Wondrous Limits of Deep Learning (AI)

Over at Language Log Mark Liberman has had a series of posts bearing on the limitations of artificial intelligence. They take Google Translate as an example. As you may know, GT has recently switched over to a new system based on so-called “deep learning” technology. The new system gives results that are noticeably better than the older one, based on different technology. Under certain “unnatural” conditions, however, it fails in a rather unusual and flamboyant way, and that’s what Liberman’s posts are.

Here’s the first one: What a tangled web they weave (Apr 15). If you type the following Japanese characters ャス they are translated as:
Us
If you double up, ャスャス, you get:
Chasus
Triple ャスャスャス:
Chasuau
And so on for evermore repetitions of the initial pair of inputs:
Chanusasuasu
Jurasurus
Jurasurasusu
Jurasurasususu
Jurasurasususu
Jurasurasusususu
The sky chase supernatural
Worth seeing is not good. Jasusturus swasher
Soundtracks of the sun
It 's a good thing.
It 's a sort of a sweet sun.
It is a surprisingly good thing.
It is a surreptitious one,
It is a photograph of the photograph taken by a photograph
It is a photograph of the photograph taken by the photographer.
It is a photograph of the photograph taken by a photograph
It is a photograph of the photograph taken by a photograph
It is a photograph taken on the next page
This is a series of photographs of a series of photographs
This is a series of photographs of a series of photographs
This is a series of photographs of a series of photographs
This is a series of photographs of a series of photographs
This is a series of photographs of a series of photographs of a series of photographs
Liberman presents several examples of this phenomenon. In all cases the input consists of two or three characters repeated time and again. Where does all this “hallucinated” (Liberman’s term) come from?

Liberman’s second post, A long short-term memory of Gertrude Stein (Apr 16), contains further examples. Liberman begins his third post, Electric sheep, by quoting a note from another Language Log author, Geoff Pullum:
Ordinary people imagine (wrongly) that Google Translate is approximating the process we call translation. They think that the errors it makes are comparable to a human translator getting the wrong word (or the wrong sense) from a dictionary, or mistaking one syntactic construction for another, or missing an idiom, and thus making a well-intentioned but erroneous translation. The phenomena you have discussed reveal that something wildly, disastrously different is going on.

Something nonlinear: 18 consecutive repetitions of a two-character Thai sequence produce "This is how it is supposed to be", and so do 19, 20, 21, 22, 23, and 24, and then 25 repetitions produces something different, and 26 something different again, and so on. What will come out in response to a given input seems informally to be unpredictable […]

Type "La plume de ma tante est sur la table" into Google Translate and ask for an English translation, and you get something that might incline you, if asked whether you would agree to ride in a self-driving car programmed by the same people, to say yes. But look at the weird shit that comes from inputting Asian language repeated syllable sequences and you not only wouldn't get in the car, you wouldn't want to be in a parking lot where it was driving around on a test run. It's the difference between what might look like a technology nearly ready for prime time and the chaotic behavior of an engineering abortion that should strike fear into the hearts of any rational human.
That’s what’s so interesting. Under ordinary conditions Google Translate does a reasonable job. The translation is not of literary quality, nor would you want to use it for legal documents, but if you’re only after a rough sense of what’s going on, it’s OK. It’s (much) better than nothing. Whatever it is that it’s doing, however, is not at all what humans do when we translate. And that I suppose, makes it all the more remarkable that it does anything at all.

Liberman goes on to reference a long and technical post by Andrej Karpathy, The Unreasonable Effectiveness of Recurrent Neural Networks, in which Karpathy notes that similar technology can be used to generate new text. Thus Karpathy trained a network on Shakespeare (in modern spelling) and produced output like this:

PANDARUS: 
Alas, I think he shall be come approached and the day 
When little srain would be attain'd into being never fed, 
And who is but a chain and subjects of his death, 
I should not sleep.

Second Senator: 
They are away this miseries, produced upon my soul, 
Breaking and strongly should be buried, when I perish 
The earth and thoughts of many states.  

DUKE VINCENTIO: 
Well, your wit is in the care of side and that.
  
Second Lord: 
They would be ruled after this chamber, and 
my fair nues begun out of the fact, to be conveyed, 
Whose noble souls I'll have the heart of the wars.
  
Clown: 
Come, sir, I will make did behold your worship.  

VIOLA: I'll drink it.

The machine clearly does not understand what its doing, but manages a curious superficial plausibility.

Keith Ellis made a comment to this post which is worth thinking about:
I have a slightly different take on this, which is that it's an example of why the recent assertions that strong AI is imminent are so absurd and ill-informed. What's really happening is that we're entering a sort of uncanny valley of weak AI, using things like neural nets and extremely large data sets, that is powerful enough to accomplish things we couldn't manage before and seems surprisingly and mysteriously "intelligent" but which is, in truth, restricted to a very limited problem domain as well as being (often) unacceptably fragile. We are nowhere close to genuine strong AI and all the claims to the contrary reveal a vast ignorance of the topic.

That said, I resolutely disagree with the Chinese Room argument here — which is to say, I think that human intelligence is not qualitatively different, only many orders of magnitude more complex, more layered, and trained on data within a problem domain spanning evolutionary time (that is to say, vastly greater). This is why it is, for our usual purposes, so much more reliable and robust. It's also why it, too, fails spectacularly and unpredictably at the margins. Furthermore, we've not even included what I think is a genuine cognitive layer of functional culture.

That contemporary machine language translation simultaneously is surprisingly successful and yet evidently fragile is not really an indictment of the underlying paradigm or ambition. It's an indictment, rather, of how poorly we humans ourselves understand the problem domain of language, translation, and cognition that we have trouble recognizing properly how these systems are both like our cognition and yet in relative terms, extraordinarily simplistic.
The most recent post in this series is The sphere of the sphere is the sphere of the sphere (Apr 22). I recommend them all to you. They’re worth thinking about.  Students of cognitive criticism might want to think about this technology in the context of topic modeling and vector representations of words.

No comments:

Post a Comment