Enthusiasms

Enthusiasms is an edited stream of consciousness, by Simen.

How much meaning is there in a word?

I have an interest in the fundamentals of language and questions about the nature of meaning. One question that seems easy but really isn’t is this: how much meaning is there in a word? How do you quantify meaning? According to Time, this question was recently tackled by a team of scientists. Except Time won’t tell you what the study was called or who the researchers were, only that it was published in Language. (What’s the point of referencing a scholarly article when you provide no information that can be straightforwardly traced to the source? Here’s what it does: it gives the impression of academic authority, without actually performing the function a reference is supposed to. It’s a ridiculous fucking practice.) A few failed googles later, I’ve found this, which appears to be the article in question. It’s called A Cross-Language Perspective on Speech Information Rate, by Francois Pellegrino, Christophe Coupé and Egidio Marsico of the University of Lyon.

Here’s some of what Time has to say about the study:

The investigators next counted all of the syllables in each of the recordings, and further analyzed how much meaning was packed into each of those syllables. A single syllable word like “bliss,” for example, is rich with meaning — signifying not ordinary happiness but a particularly serene and rapturous kind. The single syllable word “to” is less information-dense. And a single syllabile like the short i sound, as in the word “jubilee,” has no independent meaning at all.

With this raw data in hand, the investigators crunched the numbers together to arrive at two critical values for each language: The average information density for each of its syllables and the average number of syllables spoken per second in ordinary speech. Vietnamese was used as a reference language for the other seven, with its syllables (which are considered by linguists to be very information dense) given an arbitrary value of 1.

For all of the other languages, the researchers discovered, the more data-dense the average syllable is, the fewer of those syllables had to be spoken per second — and the slower the speech thus was.

This is the sort of study that is very boring, but also very important: it confirms something we were pretty sure we knew. But the Time article gives the impression that, in confirming that it takes about the same duration of time to say the same thing in different languages, the scientists have also found a way to quantify the amount of meaning in any given syllable. That would be a very interesting, but probably impossible result. Instead, they’ve done something much simpler and less interesting. They’ve translated a number of texts—originally written in English—into seven languages, and made the assumption that each text conveys the same amount of meaning. Then they’ve counted the number of syllables in each corpus. Instead of finding some absolute measure of semantic density, they’ve chosen a relative measure: each language is compared to Vietnamese. So their “density of meaning” for a language L is precisely this: the mean of V/N for every text, where V is the number of syllables in the Vietnamese text and N the number of syllables in the same text in language L. Nothing so radical as determining the amount of meaning in a word like “bliss”, then.

Back in 2009, I went over some of the philosophical problems with assigning a number to the amount of meaning in any given word or morpheme. Then, the issue that prompted me to post was the Guinness World Record entry for “most succinct word”, which implies that there is some rational and precise way to quantify how much is said in a word. The word was mamihlapinatapai, a word in the Yaghan language that allegedly means “a look shared by two people, each wishing that the other will offer something that they both desire but are unwilling to do.” This is, no doubt, a succinct word, but it seems imprecise to call it the “most succinct word”, because there is no way to cross-linguistically quantify succinctness. The fact that it takes many words in English to say what takes one word in Yaghan doesn’t really mean much.

English is on the analytic end of the spectrum, which means that it has few morphemes (the smallest speech-units that have meaning) per word. Yaghan is presumably synthetic (i.e., has many morphemes per word). It may also be fusional (morphemes carry more than one meaning). (As an example of linguistic fusion, the single-morpheme word “him” is a third person singular masculine oblique pronoun. It does not consist of separate morphemes indicating third person, singular number, masculine gender, and the oblique case.) Highly synthetic, fusional languages will tend to have words like mamihlapinatapai, where relatively short words distinguish themselves on many semantic axes. And although I don’t speak any such languages, it would surprise me greatly if there weren’t many other languages in which single words could be found that require as many words to translate into English as mamihlapinatapai.

Meaning is tricky business. Translation is tricky business. There is very rarely a one-to-one correspondence between words in two different languages, particularly when the two languages do not share a common ancestor. Even very basic words like “man” or “cat” could have subtly different connotations that make it hard to insist that the words correspond one-to-one. Unless you have a complete, fundamental account of what meaning is, and what kind of basic building blocks it’s made of, it’s hard to rationally argue even the commonsense notion that a word like “cat” packs less meaning than a word like “mamihlapinatapai”.

If there is to be such a basic account of meaning, I think we’ll find it in the nature of thought, in the way the brain encodes “mentalese”, whatever the nature of that turns out to be. And then you’re butting into the hard problem of consciousness, which is very hard indeed. There were lots of interesting ways that Time article could go, but whether due to sloppy journalism, lack of imagination, or editorial guidelines, it didn’t go anywhere.

Sep 11, 2011