Saturday, March 4, 2017

Once again, language universals

Cathleen O'Grady on Ars Technia:
Language takes an astonishing variety of forms across the world—to such a huge extent that a long-standing debate rages around the question of whether all languages have even a single property in common. Well, there’s a new candidate for the elusive title of “language universal” according to a paper in this week’s issue of PNAS. All languages, the authors say, self-organise in such a way that related concepts stay as close together as possible within a sentence, making it easier to piece together the overall meaning. [...]

A lot has been written about a tendency in languages to place words with a close syntactic relationship as closely together as possible. Richard Futrell, Kyle Mahowald, and Edward Gibson at MIT were interested in whether all languages might use this as a technique to make sentences easier to understand.

The idea is that when sentences bundle related concepts in proximity, it puts less of a strain on working memory. For example, adjectives (like “old”) belong with the nouns that they modify (like “lady”), so it’s easier to understand the whole concept of “old lady” if the words appear close together in a sentence.

The original research:

Large-scale evidence of dependency length minimization in 37 languages
Richard Futrell, Kyle Mahowald, and Edward Gibson

PNAS August 18, 2015 vol. 112 no. 33 10336-10341
doi: 10.1073/pnas.1502134112

Significance

We provide the first large-scale, quantitative, cross-linguistic evidence for a universal syntactic property of languages: that dependency lengths are shorter than chance. Our work supports long-standing ideas that speakers prefer word orders with short dependency lengths and that languages do not enforce word orders with long dependency lengths. Dependency length minimization is well motivated because it allows for more efficient parsing and generation of natural language. Over the last 20 y, the hypothesis of a pressure to minimize dependency length has been invoked to explain many of the most striking recurring properties of languages. Our broad-coverage findings support those explanations.

Abstract

Explaining the variation between human languages and the constraints on that variation is a core goal of linguistics. In the last 20 y, it has been claimed that many striking universals of cross-linguistic variation follow from a hypothetical principle that dependency length—the distance between syntactically related words in a sentence—is minimized. Various models of human sentence production and comprehension predict that long dependencies are difficult or inefficient to process; minimizing dependency length thus enables effective communication without incurring processing difficulty. However, despite widespread application of this idea in theoretical, empirical, and practical work, there is not yet large-scale evidence that dependency length is actually minimized in real utterances across many languages; previous work has focused either on a small number of languages or on limited kinds of data about each language. Here, using parsed corpora of 37 diverse languages, we show that overall dependency lengths for all languages are shorter than conservative random baselines. The results strongly suggest that dependency length minimization is a universal quantitative property of human languages and support explanations of linguistic variation in terms of general properties of human information processing.

1 comment: