Wednesday, September 18, 2019

Segmenting the language stream [words are tricky]

It is sometimes useful to reflect of the fact that, aurally, the speech stream is continuous, not segmented. The segmentation is something we impose on the stream through cognitive mechanisms – that, I argue, is the computational foundation of language. Thus early forms of writing often consisted of a continuous stream of characters, with no segmentation into separate words. Victor Mair has a post at Language Log that speaks to this, The challenging importance of spacing in Korean:
Who'da thunk it? – spacing is the most difficult aspect of Korean writing. One might have thought it would be a simple task, that word spacing / separation is innate for all speakers of a given language. Apparently that is not so.

In Hanyu Pinyin, it is called fēncí liánxiě 分詞連寫 ("word division; parsing"). Of course, it has its problems, but we do have rules to guide us, viz., zhèngcífǎ 正詞法 ("orthography").

This morning in my "Language, Script, and Society in China" course, I embarked on a discussion of the difference between zì 字 ("character") and cí 詞 ("word"). Although this seems like a simple, straightforward question, it is always one of the most difficult topics encountered in the course — especially for students of Chinese background. It took me a whole semester to get the idea across to the 72 very smart students in my language studies class at the University of Hong Kong in 2002-2003. Even at the conclusion of the semester, there were still some of the students who just couldn't comprehend the distinction.
Be sure to read the comments.

Addendum: In fact, I'll reprint one of them in full. Victor Mair, who started the thread, posts this on behalf of an unnamed colleague
Spacing–word division–assumes shared knowledge among users of what constitutes a language's words. This is not a trivial matter, and Korean linguists, lexicographers and publishers have been working the issue for decades.

The basic problem, as one of the commentators intimates, is that words, like (morpho)phonemic spelling, are an artifact of writing. They are not a given to be plucked from someone's brain. Orthography takes it upon itself to regularize (adjudicate) the intuitions users have about what constitutes the lexical units of their language, which are far from uniform and constantly shifting. Korean lacked that tradition and is catching up, although in a sense all written languages that use word division are continuously "catching up." I don't see it as a major problem, or a problem at all.

What I do find problematic in Asian languages is fluid "standards" for sentence representation, namely, where the period goes. This is not an issue (for me) in Korean, probably because the language does use word division, which enforces a discipline on writers that carries beyond the identification of (agreement on) word boundaries to one's whole approach to sentence structure. Chinese sentences–the text between periods–are often by western standards two sentences, five sentences, or partial sentences. Japanese writers also seem to have more liberty in this regard than a westerner would expect. Vietnamese sentences, in earlier novels at least, end or don't end seemingly at whim. And I question if Tibetans even have the concept of "sentence."

I've been out of this field for too long so my thinking may be dated. But there may be psycholinguistic issues at play here that merit serious study.
This too is relevant to the issue of computation in the mind. And so: I've just been thinking about this. And I'm wondering if the problem isn't similar to the problem that adolescent and post-adolescent second language learners have with pronunciation. I don't know what the current literature says about that, but in the past I've seen it attributed to a lack of neuro-plasticity. I don't find that terribly convincing. My intuition – and it's no more than that – is that the problem is more like conscious access. For some reason conscious access to (something in) the aural-motor channel has been, if not lost, somewhat degraded.

Could the same thing be going on in the transfer of segmentation from the aural-motor channel to the visuo-orthographic?

No comments:

Post a Comment