Wednesday, July 15, 2009

Deciphering an unknown language...

This post is a response to Marian's question of a few days ago.

I have seen some of the instances Marian refers to, of humans trying to decipher alien communication beginning with numbers. David Marshall also has a good point about chemistry. In effect, the idea behind these approaches is to try to find understanding by starting with something we know we must have in common.

I'm not convinced that these things would work in actual practice, simply because they rely in large part on our own perception of the separation of objects.

However, in SF generally there is a degree of commonality embedded in the premise. Sometimes the only assumption is that of common perception of objects and numbers. Sometimes it's greater, as when an author asks you to accept a computer that is able to process and decipher language, or when an author asks you to accept a machine translator (or babel fish!)

In her Xenolinguist stories, Sheila Finch often deals with thorny questions related to how alien contexts and physiologies would influence language form - and in fact, she'll be guest blogging here in the next day or so to talk a bit about her design of the Xenolinguists' universe.

In my own stories, I like to pick up at a point where linguists feel they've already got the language largely under control, and deal with the fine points of culture, dialect, and the significance of language.

But the question was how to get past numbers into deciphering more of a language. The key, I think, is context.

The folks who write about the numbers-based communication strategy are setting up a very hard problem, dealing as they are with signals sent at a distance. This kind of communication, much like writing, is highly divorced from the original context of communication.

Humans tend to see objects as separate from one another and assign words to them. This is the vocabulary, or lexicon. When humans are working in a close context of togetherness and shared activity, often enough only a single word is enough to evoke the correct meaning, as when the doctor says "scalpel" over the patient in the operating room. The further away one moves from immediate shared context, the more language features become necessary to get the message across. This is essentially the origin of grammar - pieces of sound (in the human case) stuck in with the object information to show relation, in cases when the relation is not immediately clear to both parties. Writing takes this phenomenon to an extreme, while transmission of data over interstellar distances pushes it even farther.

Humans have powerful language-processing mechanisms built into their brains. We are able to track sounds with powerful accuracy, identifying separate clicks even when they are delivered at quite high speeds - I saw an article on this in the New York Times while in Chicago a few weeks ago. When light flashes occur at the same rate, the eye perceives it as continuous motion and is unable to separate it.

Another thing humans are able to do is perceive frequency of occurrence. If I were to give you two words, say, "true" and "identifiable," you could probably give me an accurate ranking of which one occurs more commonly in English, right off the top of your head. This is a tremendous advantage in deciphering language, whether it be auditory or visual.

So what we do when we see communication is we try to identify patterns that repeat in similar contexts. If we went to an alien planet, the best approach would probably be to get as close as possible to aliens and begin taking samples of their language with as much contextual information as possible - both physical, temporal, and social. Since those contextual cues may not be the same ones we typically use, having sophisticated sensors and computer power at our fingertips would probably be a great help. Once we figure out correspondences, we can start assigning tentative meanings. Having a local resident to test these meanings on, even if it means playing recordings of things that human vocal tracts find unpronounceable, is an indispensable step.

Even so, this process would certainly take years.

The last thing I'll add tonight is a couple of examples of context and frequency distinctions, from the area of phonology. Linguistics talks about something called a "minimal pair," which is the diagnostic test for a phoneme. It goes roughly like this: if you change one single sound in a word, and the meaning of the word changes, then that single sound is a phoneme. The two words, one with one sound and one with the other, are called minimal pairs.

"bit" and "bat" show that short "i" and short "a" are separate phonemes from each other.

On the other hand, aspirated "t" and non-aspirated "t" both occur in English, but are allophones, or two different forms of the same phoneme, because they can be related by a rule, and don't make the difference between separate words.

"tar" contains aspirated "t"
"star" contains non-aspirated "t" - but the non-aspirated "t" is present because of the "s" preceding it.

These two types of "t" are phonemes in at least one Indian language (I can't cite a minimal pair, however, not knowing Indian languages well myself; perhaps one of my readers could give me an example).

I hope that sheds a little light on this process; I certainly have enormous respect for the linguists who have gone out into the field and taken on languages previously unknown to them. Marian, you can always let me know if you have other questions.

1 comment:

  1. A couple piddly questions of no real importance (and possibly no real relevance either):

    (1) Are you familiar with the book "A for Andromeda" by Fred Hoyle and John Elliot, and if so, what are your objections to method used to "decode" the alien message?

    (2) Are you familiar with the artificial language Loglan, and if so, do you have any wise insights concerning it?