Writing in Language Log today, Mark Liberman begins:
In May of 2015, I gave a talk at the Centre Cournot in Paris on the topic "Why Human Language Technology (almost) works", starting with a list of notable successes, including how well Google and Bing on-line translation did on the Centre Cournot's web site. But my theme required a few failures as well, and I found a spectacular set of examples when I tried a chapter-opening from a roman policier that I was reading (Yasmina Khadra, Le Dingue au Bistouri):
Il y a quatre choses que je déteste. Un: qu'on boive dans mon verre. Deux: qu'on se mouche dans un restaurant. Trois: qu'on me pose un lapin.These mistakes underline some general remaining difficulties. One: the treatment of pronouns. Two: the treatment of idioms that are not common in the bilingual training material. Three: the lack of common sense.
Google Translate: There are four things I hate. A: we drink in my glass. Two: we will fly in a restaurant. Three: I get asked a rabbit.
Bing Translate: There are four things that I hate. One: that one drink in my glass. Two: what we fly in a restaurant. Three: only asked me a rabbit.
Should be: There are four things I hate. One: that somebody drinks from my glass. Two: that somebody blows their nose in a restaurant. Three: that somebody stands me up.
That was then (May 2015) this is now:
So three and a half years later, have these things improved? Apparently not:
Google today: There are four things I hate. One: that we drink in my glass. Two: we fly in a restaurant. Three: let me have a rabbit.And unfortunately this is typical. Some aspects of such translations are very good, but the frequent mistakes spoil things.
Bing today: There are four things I hate. One: Drink in my drink. Two: Let's fly to a restaurant. Three: Let me be asked a rabbit.
In this context, consider Hany Hassan et al., "Achieving Human Parity on Automatic Chinese to English News Translation", arXiv 6/29/2018:
Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft’s machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.The paper is credible and impressive — and it describes a set of systems that work significantly better than what Microsoft now offers via online Bing translation, but are some time away from being generally deployed.
However, it's important to keep in mind that those new systems were trained on a large collection of carefully screened parallel texts (18 million sentence pairs), plus 7 million monolingual sentences on each side; and then tested on (withheld) examples from the same sources.
Applied to a different sort of material — conversational transcripts, or novels, or legal contracts, or scientific papers — the system would encounter new vocabulary, new constructions, and new concepts.
And what of common sense?