Tuesday, November 6, 2018

Mark Liberman (Langauge Log) on the perils of current machine translation

In May of 2015, I gave a talk at the Centre Cournot in Paris on the topic "Why Human Language Technology (almost) works", starting with a list of notable successes, including how well Google and Bing on-line translation did on the Centre Cournot's web site. But my theme required a few failures as well, and I found a spectacular set of examples when I tried a chapter-opening from a roman policier that I was reading (Yasmina Khadra, Le Dingue au Bistouri):
Il y a quatre choses que je déteste. Un: qu'on boive dans mon verre. Deux: qu'on se mouche dans un restaurant. Trois: qu'on me pose un lapin.

Google Translate: There are four things I hate. A: we drink in my glass. Two: we will fly in a restaurant. Three: I get asked a rabbit.

Bing Translate: There are four things that I hate. One: that one drink in my glass. Two: what we fly in a restaurant. Three: only asked me a rabbit.

Should be: There are four things I hate. One: that somebody drinks from my glass. Two: that somebody blows their nose in a restaurant. Three: that somebody stands me up.
These mistakes underline some general remaining difficulties. One: the treatment of pronouns. Two: the treatment of idioms that are not common in the bilingual training material. Three: the lack of common sense.
That was then (May 2015) this is now:
So three and a half years later, have these things improved? Apparently not:
Google today: There are four things I hate. One: that we drink in my glass. Two: we fly in a restaurant. Three: let me have a rabbit.

Bing today: There are four things I hate. One: Drink in my drink. Two: Let's fly to a restaurant. Three: Let me be asked a rabbit.
And unfortunately this is typical. Some aspects of such translations are very good, but the frequent mistakes spoil things.
And yet:
In this context, consider Hany Hassan et al., "Achieving Human Parity on Automatic Chinese to English News Translation", arXiv 6/29/2018:
Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft’s machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.
The paper is credible and impressive — and it describes a set of systems that work significantly better than what Microsoft now offers via online Bing translation, but are some time away from being generally deployed.

However, it's important to keep in mind that those new systems were trained on a large collection of carefully screened parallel texts (18 million sentence pairs), plus 7 million monolingual sentences on each side; and then tested on (withheld) examples from the same sources.

Applied to a different sort of material — conversational transcripts, or novels, or legal contracts, or scientific papers — the system would encounter new vocabulary, new constructions, and new concepts.
And what of common sense?


  1. that reminds me of my first experience with machine translation, back around 1998 or at UC Davis. The IT support guy (a philosophy major) who tended humanists (file under Sisyphean) gave me discs for a Spanish program the dept was no longer using -- thought I would find the compose program and keyboard useful but warned me about translation program. He added though that it could be a fine source of entertainment: translate text from English to Spanish and then the translated back into English. Rinse and repeat. Do the same starting with a Spanish text.

    1. Thanks for the comment. MT is a strange beast. But it's the beast that got linguistic (and, in parallel with Fr. Busa's work) computing started. Many of the corpus tools currently fashionable in DH originated from work in MT and CL (computational linguistics).