In this episode of Icons of ID I will take a quick look at how the definition of information used by ID proponents is nothing more than an argument from probability. In fact when ID proponents claim that chance and regularity cannot create complex specified information (CSI) all they are saying is that such pathways, as far as we know, are improbable. If a pathway is found that is probable, the measure of information, which is confusingly linked to probability decreases.
In fact, I argue, that intelligent designers similarly cannot generate specified complex information since the probability of intelligent designers designing is close to 1.
Information and probability
Elsberry and Willkins on CSI
Then again, the choice of the term “complex specified information” is itself extremely problematic, since for Dembski “complex” means neither \complicated” as in ordinary speech, nor “high Kolmogorov complexity” as understood by algorithmic information theorists. Instead, Dembski uses \complex” as a synonym for “improbable”.
So how does Dembski define information?
I(X)= - \log_2 P(X)
So in other words, information is the log of the probability. But what probability is this? Others have shown how Dembski is unclear on this issue and often moves between uniform probabilities or actual probabilities, whenever it seems better to do so.
Dembski mentions in NFL that this measure of information is similar to Shannon information. In fact Shannon’s entropy is the average of Dembski’s information measure. This confusion about information and entropy is not limited to Dembski’s writings however so let’s look at Shannon entropy and information in more detail.
Claude: Shannon: A mathematical theory of communication
In 1948 Shannon published one of his seminal papers on A mathematical theory of communication.
Shannon shows that the logarithm is the natural choice for expressing the concept of information. Entropy, a weighted measure of information is basically the expected value of information present. In other words
If there are n messages X= {X_1, …, X_n} with probabilities p(X_1) … p(X_n) then the Shannon entropy of this set is defined as:
H(X)= -p(X_1) \log_2 p(X_1) + … + p(X_n) \log_2 p(X_n)
or in other words
H(X)= E ( I(X)
Entropy is maximum when all values are equiprobable.
Information is defined as
I_S(X)=H_{max} - H(X)
Information in the Shannon sense is defined as the change in entropy before and after a particular event has taken place. Shannon information, also known as surprise, is any form of data which is not already known. In fact, when rare events occur, they generate a lot of information.
Tom Schneider has some good resources:
So what we have learned so far is that Dembski’s information measure is nothing more than a probability measure similar to Shannon’s entropy measure not Shannon’s information measure.
But the choice of the term information is quite unfortunate since it has more similarity to entropy than to Shannon information.
So let’s try to understand why Dembski argues that regularity and chance cannot create CSI. The answer is simple: If such processes have a high probability of being successful, their Dembski information measure will be low.
But the same problem applies to Intelligent designers. Given a particular ‘intelligently designed’ event, its probability is high and thus its information is low. In other words, according to Dembski’s own measure, nothing can create CSI other than pure chance.
Not much of a useful tool but the poor choice of the information measure has caused much unnecessary confusion. When in fact all Dembski was doing is repeating the age-old creationist argument that evolution or abiogenesis is improbable.
Talkorigins has some good FAQ’s on what’s wrong with these arguments.
It seems that ID is not only fundamentally flawed due to a theoretical failure of its claims but also empirically flawed in that ID has failed to be scientifically relevant. But in addition to these flaws, we also recognize the flawed arguments of Dembski based on probability. All because of the confusing usage of terms like information rather than entropy.
Seems that the intelligent designer is as powerless in creating CSI. Or alternatively an intelligent designer is as capable of creating CSI as regular processes.
Tom Schneider attracted Dembski’s ire for showing how the simple processes of variation and selection can actually increase the information in a genome.
Dembski’s complexity measures have many problems.
Surprisingly various ID proponents such as for instance Fred Heeren seem to have taken Dembski’s claims too seriously.
Heeren quotes another unsupported and in fact falsified claim by Dembski
William Dembski puts it this way: “Specified complexity powerfully extends the usual mathematical theory of information, known as Shannon information. Shannon’s theory dealt only with complexity, which can be due to random processes as well as to intelligent design. The addition of specification to complexity, however, is like a vise that grabs only things due to intelligence. Indeed, all the empirical evidence confirms that the only known cause of specified complexity is intelligence.”
Careless usage of terminology, contradictory statements and examples, confusing usage of terms and inflated claims all seem to have made the design inference ‘quite problematic’.
Understanding what “regularity,” “chance,” and “design” mean in Dembski’s framework is made more difficult by some of his examples. Dembski discusses a teacher who finds that the essays submitted by two students are nearly identical (46). One hypothesis is that the students produced their work independently; a second hypothesis asserts that there was plagiarism. Dembski treats the hypothesis of independent origination as a Chance hypothesis and the plagiarism hypothesis as an instance of Design. Yet, both describe the matching papers as issuing from intelligent agency, as Dembski points out (47). Dembski says that context influences how a hypothesis gets classified (46). How context induces the classification that Dembski suggests remains a mystery.
Elsberry and Shallit have written an excellent paper “Information Theory, Evolutionary Computation, and Dembski’s “Complex Specified Information”. They address Dembski’s fallacious reliability claims, present the differences between rarefied design and ordinary design, and the problems with apparant and actual complex specified information (CSI).
Intelligent design advocate William Dembski has introduced a measure of information called “complex specified information”, or CSI. He claims that CSI is a reliable marker of design by intelligent agents. He puts forth a “Law of Conservation of Information” which states that chance and natural laws are incapable of generating CSI.
In particular, CSI cannot be generated by evolutionary computation. Dembski asserts that CSI is present in intelligent causes and in the flagellum of Escherichia coli, and concludes that neither have natural explanations. In this paper we examine Dembski’s claims, point out significant errors in his reasoning, and conclude that there is no reason to accept his assertions.
31 Comments
Les Lane · 7 July 2004
Dembski has backed off his "Law of Conservation of Information". Immunoglobulin genes (for example) are information creating machines. Dembski recognizes this and now claims that natural systems can't create "complex information". The boundary between "simple information" and "complex information" is vague. The phrase "complex specified information" returns zero (of 13 million) articles in Science Citation Index.
rick pietz · 7 July 2004
I'm really torn by your post. On the one hand, I think the very idea that anyone spends time refuting people like Dembski is a none productive expenditure of intellectual capital. On the ohterhand, if crap like this isn't refuted, it grows a life of it's own. On the third hand, the people who buy into this crap in the first place, aren't going to ever read or hear the argument against, and if they do, they'll accept Dembski's explanations, and it still takes on a life of its own.
Pre-'urban legends' die even harder than the new ones.
Pim van Meurs · 7 July 2004
I agree, and I am struggling with these issues as well. Since I have found from past experiences that although having correct data available may not convince the committed creationists, it may affect some to investigate further. As such I believe that presenting the arguments of why the ID approach(es) do not work in an accessible manner is important.
T. Russ · 7 July 2004
Dembski's put up another essay on his site. Just giving you guys the heads up. Enjoy!
Information as a Measure of Variation By William Dembski
http://www.designinference.com/documents/2004.07.Variational_Information.pdf
steve · 7 July 2004
Steve · 7 July 2004
It's endlessly funny to me that lots of Cold Fusion papers were worth publishing, and no ID ones are. The IDiots can't even meet the bar of basic competence Cold Fusion research met.
Les Lane · 8 July 2004
Steve-
Thanks for the tip. "Cold fusion" returns 661 references on Science Citation Index. It's quantitatively roughly 20 times "more productive" than ID.
David Wilson · 12 July 2004
Pim van Meurs · 12 July 2004
By conflating information with probability Dembski has introduced quite a difficulty namely that information is not what we commonly consider it to mean. Rather than information being a measure of 'surprise' information becomes very similar to the concept of entropy. Because of his usage of probability as an information measure, he is faced with the problem that neither regularity/chance nor intelligent designers can create complex specified information.
The definition for information I chose is indeed for a uniform distribution which is not a bad assumption for initially random distributions and matches Shannon's usage of these concepts.
See for instance Randomness, Order and Replication
But you are right, the definition can easily be generalized further. In Dembski's latest opus he is somewhat more careful in his definitions but his usage of 'self information' or probability for information generates a lot of confusion and seems self-contradictory.
David Wilson · 16 July 2004
Erik 12345 · 16 July 2004
Pim van Meurs wrote in the blog entry above:
[qb]Dembski mentions in NFL that this measure of information is similar to Shannon information. In fact Shannon's entropy is the average of Dembski's information measure.[/qb]
About Dembski's definition, David Wilson commented:
[qb]This definition is not peculiar to Dembski. I have seen several texts on information theory which refer to the quantity -log2(X) as the amount information one obtains when one learns that the event X has occurred. It is sometimes called the self-information of X. It is true, as Dembski notes in his latest opus, that the mathematical development of information theory makes very little direct use of this quantity. In textbooks its role seems to be confined to motivating the definition of entropy, and many (perhaps most) don't even mention it.
The first statement is, for reasons to be explained below, mathematically ill-defined and the second is partially, but not completely, wrong.
Let us recall that an event is a set of outcomes. For example, when we consider tossed dice, we often choose the outcomes to be the face of the dice that lands upwards, i.e. the outcomes are 1, 2, 3, 4, 5, and 6. (One could make finer distinctions and consider also the orientation of the dice and even its position on the table, but it is not common. An outcome is a result of the experiment that is fully resolved to whatever precision we have chosen to consider.) An example of an event is the set of all odd outcomes, i.e. {1,3,5}. Another example of an event is the set of all outcomes smaller than 5, i.e. {1,2,3,4}.
Now, it is true that both Dembski and some information theory textbooks define "information" and "self-information"/"surprisal", respectively, in an event A as
-log(Pr(A))
The difference between the definitions is in the restrictions on the A's we are allowed to plug into the formula. This is a difference that most people probably don't think too much about, but it is a very important difference nonetheless. Dembski allows us to plug in any event A. Information theory textbooks, on the other hand, require us to first partition the possible outcomes into non-overlapping events A1, A2, ..., An. The events we are allowed to plug into the formula for "self-information"/"surprisal" are then restricted to one of these partitions. This restriction is crucial, because without it is meaningless to speak about average "self-information"/"surprisal" (note that Dembski makes no such restriction and it is consequently meaningless to speak about the average of his information). We can meaningfully average over all outcomes or over all partitions, but not over all events.
Shannon entropy is therefore NOT the average of Dembski's information measure and Dembski's definition is NOT the same as the "self-information"/"surprisal" introduced in some presentations of information theory.
David Wilson · 17 July 2004
David Wilson · 17 July 2004
Erik 12345 · 17 July 2004
Richard Wein · 17 July 2004
It seems to me the issue here is not the formula that Dembski uses but the context in which he uses it. When information theorists use the formula I = -log2(P), they do so in a context where it makes some sense to interpret this as "information". As David correctly notes, however, Dembski's introduction of the concept of "information" into what is purely a statistical argument is entirely gratuitous.
The Design Inference tells us to reject hypothesis H when
P(S|H) < alpha
where S is a "specified event" and alpha is a probability bound as proposed by Dembski.
What Dembski then does is apply the transformation I = -log2(P) to this inequality, telling us to reject H when
-log2{P(S|H)} > -log2{alpha}
or
I(S|H) > -log2{alpha}
Clearly, there is no genuine difference between the original inequality and the transformed one. In this context I is not being used as a measure of information in any useful sense. It is merely a rescaled probability.
David also correctly notes that very little use of the quantity I = -log2(P) is made in information theory. Shannon's seminal 1948 paper does not even mention it. I believe this is because it is not in fact a useful measure of information. The useful work in Shannon information theory is performed by ensemble measures such as entropy.
Richard Wein · 17 July 2004
For what it's worth, here's a link to something I wrote on the subject a while ago:
http://www.talkorigins.org/design/faqs/nfl/#shannon
I should add that I don't claim to be any expert on this subject.
Erik 12345 · 18 July 2004
Erik 12345 · 18 July 2004
This post is just an experiment with the Post-a-Comment script. The first section was previewed and modified by the preview script. The second section was written after previewing and contains exactly the same text as the first section originally contained.
Section 1 (previewed): If we denote the number of binary digits in the codeword assigned to Ak by L(Ak), then the constraints (i) & (ii) turn out to be equivalent to the constraint
(*) SUM 2^-L(Ak) = L(A1) Pr(A1) + L(A2) Pr(A2) + . . . + L(An) Pr(An).
subject to (*) ...
Section 2 (not previewed): If we denote the number of binary digits in the codeword assigned to Ak by L(Ak), then the constraints (i) & (ii) turn out to be equivalent to the constraint
(*) SUM 2^-L(Ak) <= 1.
Minimizing the average codeword length
subject to (*) ...
Section 3: Conclusion There is an evil undocumented feature in the preview script that can completely change the meaning of a comment.
Russell · 18 July 2004
Dr. 12345: There is an evil undocumented feature in the preview script that can completely change the meaning of a comment.
Yes. I've noticed that. I think the evil lurks in the "less than" sign. I think it works if you don't preview, but once you do, it aborts everything that follows
David Wilson · 19 July 2004
Erik 12345 · 19 July 2004
David Wilson · 20 July 2004
Erik 12345 · 23 July 2004
Erik 12345 · 23 July 2004
Erik 12345 · 23 July 2004
Russell · 23 July 2004
Dr. 12345: "Just for the record, to aid those who evaluate my posts by weighing my authority against the authority of my opponents, I'll note that I don't have a PhD in any field."
Yes, and Dembski has two, which speaks worlds about the significance of "PhD". I'm going to continue to think of "doctor" in its etymological sense ("teacher").
I gave up trying to follow the Wilson - 12345 discussion. It's over my head. But can we summarize for the masses? As I understand it DW said that one or a few or some of the technical indictments of Dembski's work are unwarranted. Much discussion between DW and E1 later: sort of yes, sort of no.
Big picture now: are there any mathematicians reading this who find Dembski's arguments, specifically with respect to biology, compelling?
I find his understanding of biology so ludicrous that I'm not strongly motivated to educate myself on the mathematical legerdemain he uses to rationalize it.
(He reminds me of a math prof at my college who had a "mathematical proof" that all numbers are equal to 47 (our school's numerical mascot). Only that that prof knew it was a joke.)
steve · 23 July 2004
I don't think it reflects poorly on the PhD degree. I have known many science PhDs, and they are all very intelligent. Unfortunately, intelligence is not always homogenously distributed throughout someone's range of thinking. Some people are smart in everything they do, some are a little smarter in some things than others, and some people are intelligent in some respects and crazy in others.
There's no doubt Kurt Godel was among the brightest all-time logicians. And bright at math, bright at physics. Einstein enjoyed talking physics with him. Yet, was also somewhat crazy. He died of starvation because he thought everyone was out to poison him.
Lots of people are bright at some things, crazy at others. You can have a brilliant mathematician who thinks communism's a good idea. A brilliant journalist who thinks Sun Myung Moon is the second coming. It seems like especially on religious topics, some bright people can turn off their minds and keep believing nonsense. Like Shermer said, it's not that smart people are without stupid beliefs, but they're really good at coming up with justifications.
Russell · 23 July 2004
Steve:
I'm certainly not suggesting that a PhD is negatively correlated with having worthwhile thoughts to share. I am proposing, though, that you can garner any number of degrees without ever having a worthwhile thought to share.
In the case of our ID friends, it may be that there's some Godel-like island of competence that I'm not aware of. (Well, rhetoric. I'd have to grant they're good at that.)
But worthwhile thoughts?
steve · 23 July 2004
Same difference :-)
(to use an oxymoron, like Loving God (w/r/t the biblical one))
David Wilson · 31 July 2004
David Wilson · 31 July 2004