This essay has been called to life by Steve Reuland’s post to Panda’s Thumb titled “What good is half an underlying language structure?” ( www.pandasthumb.org/pt-archives/000853.html ) which refers to Carl Zimmer’s posts to Loom (http://www.corante.com/loom/archives/2005/02/25/bu… and http://www.corante.com/loom/archives/2005/03/01/bu… ). One point touched upon in passing by Zimmer and by some comments’ writers, was the question of whether or not natural languages have all evolved from the same proto-language.
21 Comments
Intelligent Design Theorist Timmy · 6 March 2005
DaveScot · 6 March 2005
Keanus · 6 March 2005
Mark, I'm neither a linguist nor a competent statistical analyst, but, if my memory is correct, you've restricted your analysis to nine Indo-European languages and three non-Indo-European languages (Finnish, Aramaic and Hebrew). And unless I missed something, you've also analyzed the letter strings and frequencies but not phonemes (I do recognize that you've probably mapped the equivalent---meaning "same" sound---characters of different alphabets to each other). That may be a suitable first, to refine your technique, but shouldn't you be looking at languages outside the Indo-European group like Indo-Chinese, Malay-Polynesian, the languages of sub-Saharan Africa, Japanese, Dravidian, the various American Indian tongues, and other rarer language? And shouldn't you be looking at the phoneme patterns and frequencies and syntactical structures, which more accurately reflect the substance of a language than the characters found in the written form?
Having done nothing but ask questions, I should add that I've often thought that a careful analysis of existing languages, especially before some of the rarer ones go extinct, offers the potential for shedding as much or more light on the radiation of human culture and evolution as the analysis of DNA. And that being my perception, I've been surprised not to read in the popular press articles and essays about how languages mirror human development.
Keanus · 6 March 2005
At the risk of humoring the troll, let me add my two cents worth in response DaveScot. I don't recall and will not bother to search out what Reed may or may not have said, but in considering the origin of language at least two basic questions come to mind: 1) When and how did the capacity for language appear and evolve? 2) How did language itself, its development and use, arise and evolve? The two are related but different questions that may have related or totally different and unrelated answers. And the origins of ability and actual usage could well have been many thousands or millions of years apart.
I've often thought that a reasonable hypothesis would be that the capacity for language arose once, but that language itself arose multiple times in multiple places among isolated groups of early hominids or pre-hominids (or the first early modern humans), giving rise to the multiplicity of known language groups. The real challenge will be determining if basic syntax is a function of a hardwired brain or embedded in the nature of language itself.
Mark Perakh · 6 March 2005
Keanus: Thanks for your comment. I agree with you that my and Brendan's study was very limited in scope (although it took a lot of time and effort) and that more languages should have been studied, including all those you've listed. I am too old to expect that I'll be able to conduct such an extensive study, and Brendan is too busy with his main interests which are in combinatorics, so the continuation of the LSC study could conceivable happen only if some younger people take it up. So far, besides our original study, I know of only one other case of the use of our technique by other people. Two prominent Voynich manuscript scholars applied our method and have essentially confirmed the data I reported on my site for Voynich manuscript. They are interested only in decoding Voynich, so they did not conduct any measurements beyond that limited goal. I am not even sure that I and Brendan will finally submit a paper to a printed media because after I posted the data on my site, we had a number of improvements in technique, but I never got to put in in an orderly writing - it all sits in emails between me and Brendan. Cheers, Mark
Reed A. Cartwright · 6 March 2005
DaveScot · 7 March 2005
Ummm... Reed, why don't you warn all the others who go far beyond me, routinely, in causing offense? Let me know if you need any examples.
DaveScot · 7 March 2005
DaveScot · 7 March 2005
Sorry about duplicate posting. The first was somehow delayed in posting by a LONG time.
pough · 7 March 2005
Is it a case of "ad hominem abuse" (which is a bizarre term, IMO) or is it simply a case of wandering into someone's blog (which is in many ways a personal space, or at least personal property) and annoying them? Is Reed a non-partisan referee in a come-one, come-all institution, or is he a guy who has a blog and doesn't feel like letting some jerkass make a nuisance of himself much longer? People have the strangest ideas about websites.
Roger Tang · 7 March 2005
"Ummm . . . Reed, why don't you warn all the others who go far beyond me, routinely, in causing offense"
He does. Pay attention.
DaveScot · 7 March 2005
pough
"jerkass"
very nice term there
maybe reed agrees that it was necessary
Reed?
Enough · 7 March 2005
Dave, stop antagonizing. It doesn't make you a bigger man.
Dick Thompson · 7 March 2005
Two questions.
What were your results for the Voynich ms?
Have you ever tried your method on the Enochian Calls (google)? They are supposed to have the same source as Voynich (John Dee's slimy associate).
Mark Perakh · 7 March 2005
Loren Petrich · 8 March 2005
Although Mark Perakh's work is very interesting, I don't see how it demonstrates common descent of languages. What he has found is IMO more likely a side effect of how our minds/brains work.
There are several other phenomena of language that may reasonably be explained in such a fashion, without resorting to the hypothesis of common descent. Like assimilation of neighboring sounds, where one sound is altered to make it flow more smoothly with a neighboring sound. For example, a prefix ending with -n, like in-, con-, or syn-, has its /n/ sound changed to /m/ or /ng/ or /r/ or /l/ depending on what sound comes next, simply because it is easier to pronounce that way. Also, the English indefinite article was originally always "an", but nowadays loses the n before a consonant for that reason.
To demonstrate common descent, one has to find features of language that are (1) difficult to explain with such hypotheses and (2) rarely borrowed, the linguistic version of lateral gene transfer. Morphology and basic vocabulary are good places to look, though they are not absolutely unborrowable. This conclusion is reached by studying languages with long written records; though they may change over time, what I've described generally holds true. Compare present-day English with Old English; Though present-day English has a much simpler morphology than Old English and an enormous quantity of borrowings, most of the more basic vocabulary is well-preserved, and there is even a fair amount of continuity in grammar, like formation of verb past tenses and participles, especially irregular formation of these.
One can do the same with Latin and the Romance languages; despite various changes, much continuity is still recognizable in basic vocabulary and grammar.
This means that we can extrapolate beyond where the paper trails end and infer the existence of long-gone languages. This is easy for the Germanic languages, like English and German; they have an abundance of shared vocabulary and grammatical features. All but modern English have definite vs. indefinite adjective declensions, and all have two types of verb declension: the "strong" (vowel shifts, like English sing, sang, sung), and the "weak" (English -ed and cognates).
One can do likewise for other language families, like Celtic, Baltic, Slavic, Indic, Semitic, etc.; one can even find bigger families like Indo-European. However, the farther and farther one looks back in time, the more details get obscured by language change; for that reason, such proposed families as Nostratic and Sino-Caucasian are not widely accepted.
Here is a simple table:
Indo-European:
Germanic:
English: me, one, two, three, ten, name, sun, star
Old English: me-, an, twa, thri, tien, nama, sunne, steorra
German: mi-, eins, zwei, drei, zehn, Name, Sonne, Stern
Swedish: mi-, en, tva, tre, tio, namn, sol, stjarna
Gothic: mi-, ains, twai, threis, taihun, namo, sunna, stairno
Slavic:
Russian: me-, odin, dva, tri, desyat', imya, solntse, zvezda
Serbo-Croatian: mi, jedan, dva, tri, deset, ime, sunce, svjezda
Bulgarian: me-, edin, dva, tri, deset, ime, sluntse, trugvam
Celtic:
Irish Gaelic: me-, aon, do, tri, deich, ainm, grian, ralta
Breton: me, unan, daou, tri, dek, anv, heol, sterenn
Latin-Romance:
Latin: me-, unus, duo, tres, decem, nomen, sol, stella
Italian: me, uno, due, tre, dieci, nome, sole, stella
Spanish: me, uno, dos, tres, diez, nombre, sol, estrella
French: me, un, deux, trois, dix, nom, soleil, etoile
Hellenic:
Classical Greek: eme, heis, duo, treis, deka, onoma, helios, aster
Indic:
Sanskrit: ma-, eka, dvaa, trayas, dasha, naama, surya, taara
Hindi: mai, ek, do, tin, das, nam, surya, tara
Bengali: ami, aek, dui, tin, dash, nam, surya, tara
Sinhalese: ma-, eka, deka, tuna, dahaya, nama, ira, tharuwa
Ancestral IE: *me-, *oinos, *dwo, *treyes, *dekm, *nomn, *sawel, *ster
(reconstructed)
Uralic:
Finnish: mi-, yksi, kaksi, kolme, kymmenen, nimi, aurinko, tähti
Hungarian: ?, egy, kettö, három, tiz, név, nap, csillag
Semitic:
Hebrew: -i, ahat, shtayim, shalosh, eser, shem, shemesh, kokhab
Arabic: -i, waahid, ithnaan, thalaatha, 'ashara, ism, shams, kaukab
Sumerian: ?, desh, min, pesh, hu, mu, utu, kilib
Basque: ?, bat, bi, hiru, hamar, ?, ?, ?
Notice the varying amounts of resemblance. There is a little bit of resemblance between Indo-European and Uralic; Nostratic includes these two. And there is even less between these two and Semitic.
This treelike pattern of resemblance resembles what one finds from biological evolution, and has a similar explanation; it is unexplained by mythologies like the Tower of Babel story.
Mark Perakh · 8 March 2005
Loren Petrich · 8 March 2005
Yes, Mark Perakh, I had read all those articles you'd written; that statistical regularity seems interesting. And I think that this work ought to be expanded to different genres of text, like conversation transcripts vs. expository writing vs. creative writing vs. poetry. And also to "plain" vs. "flowery" and "serious" vs. "funny".
But I still think that that unity is a side effect of how we process language; how much of a short-term capacity our brains have.
And descent from a shared ancestor has no direct connection with brain mechanisms; several early human populations may have invented ancestral languages separately. However, I believe that to be unlikely, for these reasons:
(1) Our brains have adaptations for interpreting and generating language.
(2) Language is a human universal; no full-scale human society has ever been found without it.
So our species would always have had language, and the same could have been true of some ancestral species.
And if our species had originated from some relatively small offshoot population (the Punctuated Equilibrium picture), that population would likely have had a single language. Meaning that all present-day languages are descended from a single one.
But reconstructing it is something that most mainstream linguists refuse to think about, because it seems next to impossible. Ancestral Indo-European was spoken about 5000-6000 years ago; this ancestral language was spoken about 100,000 years ago.
And finally, I'm not sure what would be a good online introduction to historical/comparative linguistics. Shall I search for one?
Mark Perakh · 9 March 2005
Loren, I would be happy if the LSC study were expanded. Unfortunately there is hardly a chance I'll be able to do so (the same relates to Brendan, although for a different reason). I'd welcome any young folks taking it up and would be happy to answer any questions they might have in relation to the experiments or measurements. Best!
DonkeyKong · 9 March 2005
Interesting but inconclusive.
English has words in it from French that occured within the time of human history.
These are words that had English equivalents that were simply replaced by french.
Now English words are replacing French words.
Both of these language exchanges are independant of the similiarities of ancient English and ancient French.
If you have 100k years there will be similiarities regardless of if there was 1 origional or 100 origional languages.
Sorry but the answers are forever unknowable unless you have a time machine...
Emanuele Oriano · 9 March 2005
Wow, Mr. DonkeyKong!
Your command of linguistics is as impressive as your mastery of biology.