Sunday, December 6, 2015

In 16 books Dickens used 37,504 words

but just what that means is subject to various qualifications. For example:
I downcased everything, split hyphenated words, and counted only all-alphabetic tokens. Of course the 37,504 types constitute a smaller number of "word families" — thus we have
504 dog
200 dogs
32 dogged
27 doggedly
5 dogging
3 doglike
2 doggedness
The meaning and usage of dogged and dogging are far from 100% predictable from the meaning and usage of dog, so there's a question about where to draw the boundaries of dog's "word family".
That's from a recent post by Mark Liberman at Language Log. It's about the of people's vocabularies, which is surprisingly difficult to estimate for a variety of reasons. On the one hand, what is a word? On the other hand, however you define it, how do you count them? You can't just ask people to list all the words they know. And then: what counts as knowing a word?

Keeping all that in mind, it seems the educated Americans know between 40K and 70K words.

No comments:

Post a Comment