Letter Serial Correlation points to languages evolution

Posted 6 March 2005 by

This essay has been called to life by Steve Reuland’s post to Panda’s Thumb titled “What good is half an underlying language structure?” ( www.pandasthumb.org/pt-archives/000853.html ) which refers to Carl Zimmer’s posts to Loom (http://www.corante.com/loom/archives/2005/02/25/bu… and http://www.corante.com/loom/archives/2005/03/01/bu… ). One point touched upon in passing by Zimmer and by some comments’ writers, was the question of whether or not natural languages have all evolved from the same proto-language.

21 Comments

Intelligent Design Theorist Timmy · 6 March 2005

Intelligent Design Linguistic Theory (i.e. Genesis) knew this a long time ago.

Now the whole earth had one language and few words.

One day you scientists will discover that we're right on everything else too. ps--do NOT learn a foreign language. God does not like that one bit.

DaveScot · 6 March 2005

Hold on a second. Reed A. Cartright, too clever by half, in the previous language thread corrected me by saying the discussion is about the evolution of language ability, not the evolution of language. So shouldn't this

One point touched upon in passing by Zimmer and by some comments' writers, was the question of whether or not natural languages have all evolved from the same proto-language.

actually be this, according to the Reed Hypothesis (emphasis my changes)

One point touched upon in passing by Zimmer and by some comments' writers, was the question of whether or not natural language ability have all evolved from the same proto-language ability.

In the future, Reed, make sure your brain is engaged before putting your mouth in gear. Thanks in advance.

Keanus · 6 March 2005

Mark, I'm neither a linguist nor a competent statistical analyst, but, if my memory is correct, you've restricted your analysis to nine Indo-European languages and three non-Indo-European languages (Finnish, Aramaic and Hebrew). And unless I missed something, you've also analyzed the letter strings and frequencies but not phonemes (I do recognize that you've probably mapped the equivalent---meaning "same" sound---characters of different alphabets to each other). That may be a suitable first, to refine your technique, but shouldn't you be looking at languages outside the Indo-European group like Indo-Chinese, Malay-Polynesian, the languages of sub-Saharan Africa, Japanese, Dravidian, the various American Indian tongues, and other rarer language? And shouldn't you be looking at the phoneme patterns and frequencies and syntactical structures, which more accurately reflect the substance of a language than the characters found in the written form?

Having done nothing but ask questions, I should add that I've often thought that a careful analysis of existing languages, especially before some of the rarer ones go extinct, offers the potential for shedding as much or more light on the radiation of human culture and evolution as the analysis of DNA. And that being my perception, I've been surprised not to read in the popular press articles and essays about how languages mirror human development.

Keanus · 6 March 2005

At the risk of humoring the troll, let me add my two cents worth in response DaveScot. I don't recall and will not bother to search out what Reed may or may not have said, but in considering the origin of language at least two basic questions come to mind: 1) When and how did the capacity for language appear and evolve? 2) How did language itself, its development and use, arise and evolve? The two are related but different questions that may have related or totally different and unrelated answers. And the origins of ability and actual usage could well have been many thousands or millions of years apart.

I've often thought that a reasonable hypothesis would be that the capacity for language arose once, but that language itself arose multiple times in multiple places among isolated groups of early hominids or pre-hominids (or the first early modern humans), giving rise to the multiplicity of known language groups. The real challenge will be determining if basic syntax is a function of a hardwired brain or embedded in the nature of language itself.

Mark Perakh · 6 March 2005

Keanus: Thanks for your comment. I agree with you that my and Brendan's study was very limited in scope (although it took a lot of time and effort) and that more languages should have been studied, including all those you've listed. I am too old to expect that I'll be able to conduct such an extensive study, and Brendan is too busy with his main interests which are in combinatorics, so the continuation of the LSC study could conceivable happen only if some younger people take it up. So far, besides our original study, I know of only one other case of the use of our technique by other people. Two prominent Voynich manuscript scholars applied our method and have essentially confirmed the data I reported on my site for Voynich manuscript. They are interested only in decoding Voynich, so they did not conduct any measurements beyond that limited goal. I am not even sure that I and Brendan will finally submit a paper to a printed media because after I posted the data on my site, we had a number of improvements in technique, but I never got to put in in an orderly writing - it all sits in emails between me and Brendan. Cheers, Mark

Reed A. Cartwright · 6 March 2005

Umm, Dave, notice the phrase "in passing" in Mark's comment?

In the future, Reed, make sure your brain is engaged before putting your mouth in gear. Thanks in advance.

— DaveScot
If you continue to behave like an ass, I will ban you. Consider yourself warned.

DaveScot · 7 March 2005

Ummm... Reed, why don't you warn all the others who go far beyond me, routinely, in causing offense? Let me know if you need any examples.

DaveScot · 7 March 2005

If you continue to behave like an ass, I will ban you. Consider yourself warned.

— Reed
It would be nice if you exercised that power even handedly. I can give you examples of ad hominem abuse here far more egregious than mine but I suspect you know what I'm talking about without specific examples.

DaveScot · 7 March 2005

Sorry about duplicate posting. The first was somehow delayed in posting by a LONG time.

pough · 7 March 2005

Is it a case of "ad hominem abuse" (which is a bizarre term, IMO) or is it simply a case of wandering into someone's blog (which is in many ways a personal space, or at least personal property) and annoying them? Is Reed a non-partisan referee in a come-one, come-all institution, or is he a guy who has a blog and doesn't feel like letting some jerkass make a nuisance of himself much longer? People have the strangest ideas about websites.

Roger Tang · 7 March 2005

"Ummm . . . Reed, why don't you warn all the others who go far beyond me, routinely, in causing offense"

He does. Pay attention.

DaveScot · 7 March 2005

pough

"jerkass"

very nice term there

maybe reed agrees that it was necessary

Reed?

Enough · 7 March 2005

Dave, stop antagonizing. It doesn't make you a bigger man.

Dick Thompson · 7 March 2005

Two questions.

What were your results for the Voynich ms?

Have you ever tried your method on the Enochian Calls (google)? They are supposed to have the same source as Voynich (John Dee's slimy associate).

Mark Perakh · 7 March 2005

Re: comment by Dick Thompson, # 19232. Dick asked

What were your results for the Voynich ms?

. The results of the study of LSC in Voynich can be seen at http://members.cox.net/marperak/Texts/voynich1.htm and http://members.cox.net/marperak/Texts/voynich2.htm. Regarding Dick's second question, the answer is No. Mark

Loren Petrich · 8 March 2005

Although Mark Perakh's work is very interesting, I don't see how it demonstrates common descent of languages. What he has found is IMO more likely a side effect of how our minds/brains work.

There are several other phenomena of language that may reasonably be explained in such a fashion, without resorting to the hypothesis of common descent. Like assimilation of neighboring sounds, where one sound is altered to make it flow more smoothly with a neighboring sound. For example, a prefix ending with -n, like in-, con-, or syn-, has its /n/ sound changed to /m/ or /ng/ or /r/ or /l/ depending on what sound comes next, simply because it is easier to pronounce that way. Also, the English indefinite article was originally always "an", but nowadays loses the n before a consonant for that reason.

To demonstrate common descent, one has to find features of language that are (1) difficult to explain with such hypotheses and (2) rarely borrowed, the linguistic version of lateral gene transfer. Morphology and basic vocabulary are good places to look, though they are not absolutely unborrowable. This conclusion is reached by studying languages with long written records; though they may change over time, what I've described generally holds true. Compare present-day English with Old English; Though present-day English has a much simpler morphology than Old English and an enormous quantity of borrowings, most of the more basic vocabulary is well-preserved, and there is even a fair amount of continuity in grammar, like formation of verb past tenses and participles, especially irregular formation of these.

One can do the same with Latin and the Romance languages; despite various changes, much continuity is still recognizable in basic vocabulary and grammar.

This means that we can extrapolate beyond where the paper trails end and infer the existence of long-gone languages. This is easy for the Germanic languages, like English and German; they have an abundance of shared vocabulary and grammatical features. All but modern English have definite vs. indefinite adjective declensions, and all have two types of verb declension: the "strong" (vowel shifts, like English sing, sang, sung), and the "weak" (English -ed and cognates).

One can do likewise for other language families, like Celtic, Baltic, Slavic, Indic, Semitic, etc.; one can even find bigger families like Indo-European. However, the farther and farther one looks back in time, the more details get obscured by language change; for that reason, such proposed families as Nostratic and Sino-Caucasian are not widely accepted.

Here is a simple table:

Indo-European:
Germanic:
English: me, one, two, three, ten, name, sun, star
Old English: me-, an, twa, thri, tien, nama, sunne, steorra
German: mi-, eins, zwei, drei, zehn, Name, Sonne, Stern
Swedish: mi-, en, tva, tre, tio, namn, sol, stjarna
Gothic: mi-, ains, twai, threis, taihun, namo, sunna, stairno
Slavic:
Russian: me-, odin, dva, tri, desyat', imya, solntse, zvezda
Serbo-Croatian: mi, jedan, dva, tri, deset, ime, sunce, svjezda
Bulgarian: me-, edin, dva, tri, deset, ime, sluntse, trugvam
Celtic:
Irish Gaelic: me-, aon, do, tri, deich, ainm, grian, ralta
Breton: me, unan, daou, tri, dek, anv, heol, sterenn
Latin-Romance:
Latin: me-, unus, duo, tres, decem, nomen, sol, stella
Italian: me, uno, due, tre, dieci, nome, sole, stella
Spanish: me, uno, dos, tres, diez, nombre, sol, estrella
French: me, un, deux, trois, dix, nom, soleil, etoile
Hellenic:
Classical Greek: eme, heis, duo, treis, deka, onoma, helios, aster
Indic:
Sanskrit: ma-, eka, dvaa, trayas, dasha, naama, surya, taara
Hindi: mai, ek, do, tin, das, nam, surya, tara
Bengali: ami, aek, dui, tin, dash, nam, surya, tara
Sinhalese: ma-, eka, deka, tuna, dahaya, nama, ira, tharuwa

Ancestral IE: *me-, *oinos, *dwo, *treyes, *dekm, *nomn, *sawel, *ster
(reconstructed)

Uralic:
Finnish: mi-, yksi, kaksi, kolme, kymmenen, nimi, aurinko, tähti
Hungarian: ?, egy, kettö, három, tiz, név, nap, csillag

Semitic:
Hebrew: -i, ahat, shtayim, shalosh, eser, shem, shemesh, kokhab
Arabic: -i, waahid, ithnaan, thalaatha, 'ashara, ism, shams, kaukab

Sumerian: ?, desh, min, pesh, hu, mu, utu, kilib

Basque: ?, bat, bi, hiru, hamar, ?, ?, ?

Notice the varying amounts of resemblance. There is a little bit of resemblance between Indo-European and Uralic; Nostratic includes these two. And there is even less between these two and Semitic.

This treelike pattern of resemblance resembles what one finds from biological evolution, and has a similar explanation; it is unexplained by mythologies like the Tower of Babel story.

Mark Perakh · 8 March 2005

Thanks, Loren for your interesting comment (# 19267). You wrote

Although Mark Perakh's work is very interesting, I don't see how it demonstrates common descent of languages. What he has found is IMO more likely a side effect of how our minds/brains work.

. First, thanks for your kind words regarding the LSC work (one correction- Brendan McKay is my co-author, so referring to it as just Mark Perakh's work is imprecise). I agree with your notion that the features unearthed by the LSC method in the studied languages reflect how out minds/brains work. The same can, perhaps, be said about any features of a language. The human ability for a language is a function of the human brain, isn't it, so whatever features of a language there are, they all somehow are effects stemming from how our brain works. Whether you may call them "side efffects" or view them as some of the principal features of the brain's work, is unclear until we understand in detail how the brain works, which so far is still the goal hopefully to be reached some day. Being displays of how the brain works does not prevent these features from pointing to the common descdent of all languages - human brains are presumably all working the same way regardless of race, ethnicity, etc, right? The common descent of languages may be a natural result of that similarity of how the brains of various ethnical groups work. I don't know, Loren, whether you base your fine comment only on this my post to PT or on having perused the eight artiles on my site. In the latter case you could have seen all those curves of the LSC sums which all have the same principal shape for all studied meaningful texts but not for gibberish regardless of the gibberish texts' structure. This is IMHO an impressive manifestation of the intrinsic unity of all studied languages. Does it "prove" the common descent of these languages? No, it does not. But it does, IMHO, jibe with such a hypothesis. It may be just one more piece of circumstantial evidence in favor of the common descent. I have no comments to the rest of your interesting remarks, and thank you again for taking time to write such a detailed and enlightening post. Best wishes, Mark

Loren Petrich · 8 March 2005

Yes, Mark Perakh, I had read all those articles you'd written; that statistical regularity seems interesting. And I think that this work ought to be expanded to different genres of text, like conversation transcripts vs. expository writing vs. creative writing vs. poetry. And also to "plain" vs. "flowery" and "serious" vs. "funny".

But I still think that that unity is a side effect of how we process language; how much of a short-term capacity our brains have.

And descent from a shared ancestor has no direct connection with brain mechanisms; several early human populations may have invented ancestral languages separately. However, I believe that to be unlikely, for these reasons:

(1) Our brains have adaptations for interpreting and generating language.

(2) Language is a human universal; no full-scale human society has ever been found without it.

So our species would always have had language, and the same could have been true of some ancestral species.

And if our species had originated from some relatively small offshoot population (the Punctuated Equilibrium picture), that population would likely have had a single language. Meaning that all present-day languages are descended from a single one.

But reconstructing it is something that most mainstream linguists refuse to think about, because it seems next to impossible. Ancestral Indo-European was spoken about 5000-6000 years ago; this ancestral language was spoken about 100,000 years ago.

And finally, I'm not sure what would be a good online introduction to historical/comparative linguistics. Shall I search for one?

Mark Perakh · 9 March 2005

Loren, I would be happy if the LSC study were expanded. Unfortunately there is hardly a chance I'll be able to do so (the same relates to Brendan, although for a different reason). I'd welcome any young folks taking it up and would be happy to answer any questions they might have in relation to the experiments or measurements. Best!

DonkeyKong · 9 March 2005

Interesting but inconclusive.

English has words in it from French that occured within the time of human history.

These are words that had English equivalents that were simply replaced by french.

Now English words are replacing French words.

Both of these language exchanges are independant of the similiarities of ancient English and ancient French.

If you have 100k years there will be similiarities regardless of if there was 1 origional or 100 origional languages.

Sorry but the answers are forever unknowable unless you have a time machine...

Emanuele Oriano · 9 March 2005

Wow, Mr. DonkeyKong!

Your command of linguistics is as impressive as your mastery of biology.