Information in biology

Posted 2 January 2008 by PvM

In the Stanford Encyclopedia of Philosophy Professor of Philosophy at Harvard University Peter Godfrey-Smith provides a useful definition of Shannon Information and biology. Professor Godfrey-Smith is also author of "Information and the Argument from Design" which was part of the collection edited by Robert Pennock title Intelligent Design Creationism and Its Critics

In the weaker sense, informational connections between events or variables involve no more than ordinary correlations (or perhaps correlations that are "non-accidental" in some physical sense involving causation or natural laws). This sense of information is associated with Claude Shannon (1948), who showed how the concept of information could be used to quantify facts about contingency and correlation in a useful way, initially for communication technology. For Shannon, anything is a source of information if it has a number of alternative states that might be realized on a particular occasion. And any other variable carries information about the source if its state is correlated with that of the source. This is a matter of degree; a signal carries more information about a source if its state is a better predictor of the source, less information if it is a worse predictor.

This way of thinking about contingency and correlation has turned out to be useful in many areas outside of the original technological applications that Shannon had in mind, and genetics is one example. There are interesting questions that can be asked about this sense of information (Dretske 1981), but the initially important point is that when a biologist introduces information in this sense to a description of gene action or other processes, she is not introducing some new and special kind of relation or property. She is just adopting a particular quantitative framework for describing ordinary correlations or causal connections. Consequently, philosophical discussions have sometimes set the issue up by saying that there is one kind of "information" appealed to in biology, Shannon's kind, that is unproblematic and does not require much philosophical attention. The term "causal" information is sometimes used to refer to this kind, though this term is not ideal. Whatever it is called, this kind of information exists whenever there is ordinary contingency and correlation. So we can say that genes contain information about the proteins they make, and also that genes contain information about the whole-organism phenotype. But when we say that, we are saying no more than what we are saying when we say that there is an informational connection between smoke and fire, or between tree rings and a tree's age

Godfrey-Smith also pointed out, as have many ID critics before him, how Dembski and other "ID proponents of "Intelligent Design" creationism appeal to information theory to make their arguments look more rigorous." For instance Dembski likes to use the term information, rather than probability because the former can appeal to Information theory even though all he does is apply a transformation to a probability

To assign a measure of information to the event, you just mathematically transform its probability. You find the logarithm to the base 2 of that probability, and take the negative of that logarithm. A probability of 1/4 becomes 2 bits of information, as the logarithm to the base 2 of 1/4 is -2. A probability of 1/32 becomes 5 bits of information, and so on. In saying these things, we are doing no more than applying a mathematical transformation to the probabilities. Because the term "information" is now being used, it might seem that we have done something important. But we have just re-scaled the probabilities that we already had.

That's the full extent of ID's appeal to information theory, take the negative base 2 logarithm of a probability.

Despite all the detail that Dembski gives in describing information theory, information is not making any essential contribution to his argument. What is doing the work is just the idea of objective probability. We have objective probabilities associated with events or states of affairs, and we are re-expressing these probabilities with a mathematical transformation.

So what about some of the other terms used by ID proponents such as complexity? Surely that means something?

So far I have discussed Dembski's use of the term "information." Something should be said also about "complexity" and "specification," as Dembski claims that the problem for Darwinism is found in cases of "complex specified information" (CSI). Do these concepts add anything important to Dembski's argument? "Complexity" as used by Dembski does not add anything, as by "complex information" Dembski just means "lots of information."

Again we find that ID's usage of terminology adds nothing and only leads its followers into confusion as they have come to believe that these terms mean something else. In other words, Complex Specified Information CSI all boils down to the following

That completes the outline of Dembski's information-theoretic framework. Dembski goes on to claim that life contains CSI – complex specified information. This looks like an interesting and theoretically rich property, but in fact it is nothing special. Dembski's use of the term "information" should not be taken to suggest that meaning or representation is involved. His use of the term "complex" should not be taken to suggest that things with CSI must be complex in either the everyday sense of the term or a biologist's sense. Anything which is unlikely to have arisen by chance (in a sense which does not involve hindsight) contains CSI, as Dembski has defined it.

Or in other words

So, Dembski's use of information theory provides a roundabout way of talking about probability.

Back to the age old creationist argument of improbability. Richard Wein, Mark Perakh, Wesley Elsberry and many others have shown how Dembski's 'novel' approach is neither novel nor particularly relevant as we lack sufficient resources to calculate the probabilities involved. In other words, the reason why ID is scientifically vacuous is because all it can contribute is a calculation of the negative base 2 logarithm of a probability and it cannot calculate said probability.

103 Comments

Ryan · 2 January 2008

I've been hammering the creationist argument about information a lot. It gets brought up over and over, and the never seem to acknowledge that people have addressed their concern before.

Check out the blog:

http://aigbusted.blogspot.com

-Ryan

Tex · 3 January 2008

proponents of “Intelligent Design” creationism appeal to information theory to make their arguments look more rigorous

One of the ways IDers like to use information theory is to claim that information cannot arise on its own from random assortments of molecules (and not just IDers - I think Paul Davies uses a similar argument). Although I am a biologist with no training in information theory, it has always seemed to me that this argument (like all of theirs) is bogus. There is more information inherent in a random collection of molecules than a highly ordered one. Using water as an example, given the position of one molecule in an ice cube of a given size, you could very specify the position of all the other water molecules with a minimum amount of information that specified the parameters of the crystal structure and distance between molecules. Given the same amount of water in a steam vapor and the position of one molecule, it would require much more information to specify the position of any other water molecule. You could encode a much more complex message using the position of the water molecules in the vapor than in the ice crystal. Similarly, there is more total information in a random collection of six billion nucleotides in a beaker than there is in my genome. The only difference is that I have RNA polymerases and ribosomes to decode what little information remains. As I said, I have no background in information theory, so please let me know if I have missed something.

Joe Felsenstein · 3 January 2008

I think I have a different perspective. Specified information, which was defined originally by Leslie Orgel (as William Dembski himself says) is perfectly meaningful once we agree on what scale to rank individuals (and the relevant one is fitness). Dembski's critics point to his vagueness about what the scale is -- but defining it as fitness is the way to rescue that part of his argument. The rest of his argument, that deterministic and random functions can't create it, is where he is wrong, as I explain in my article in the current issue of Reports of the NCSE (the issue that isn't quite up on their web page yet).

The idea that random stuff has lots of information is mistaken. It has lots of things that need explaining but that is different from saying that it has a lot of specified information. It has none. If I ask you for some good hot tips on the coming horse races, and you say sure and send me random numbers, I ought to be royally ticked off, and should not congratulate you on sending me so much information. Because what you sent me does not help me at all.

Popper's Ghost · 3 January 2008

If I ask you for some good hot tips on the coming horse races, and you say sure and send me random numbers, I ought to be royally ticked off, and should not congratulate you on sending me so much information.

Sigh. How many times does it have to be said that Shannon information is not the same as semantic content? The sequence of random numbers contain a lot of information because it takes a lot of bits to transmit it.

JuliaL · 3 January 2008

That’s the full extent of ID’s appeal to information theory, take the negative base 2 logarithm of a probability.

Why the negative base 2 logarithm of a probability? In other words, why not the positive base 3? Why not divide by 112 before taking the negative base 2 logarithm? Why not just add 42?

Jeffrey Shallit · 3 January 2008

Joe, the whole concept of "specified information" is nonsense. A 'specification' means the object conforms to a pattern, i.e., is easy to describe. But 'information' in the sense it is understood by mathematicians and computer scientists measures to what extent an object is hard to describe; that it, how much it does not conform to a pattern.

I recommend reading Kolmogorov complexity and its applications by my colleagues Li and Vitanyi, or my long paper with Elsberry debunking the notion of 'specified complexity'.

fnxtr · 3 January 2008

Semantic content. I think that's in a way what WAD is trying to say, that life has meaning. But that's a philosophical argument, and he's trying to make it sound scientific by calling it 'information' instead of 'meaning'... or 'intent'... or 'purpose'...or 'design'. Circular and pointless.

snex · 3 January 2008

JuliaL: Why the negative base 2 logarithm of a probability?

because base 2 gives you the number of bits required, a "bit" being a 2-state function. if you used base 3 or base 4, youd get "tribits" or "quadbits" which arent pretty to work with.

Paul Burnett · 3 January 2008

The Stanford Encyclopedia of Philosophy also has an article on "Creationism," which contains the following statement in its conclusion:

"Scientifically Creationism is worthless, philosophically it is confused, and theologically it is blinkered beyond repair. The same is true of its offspring, Intelligent Design Theory."

mark · 3 January 2008

I forget--how did Dembski claim to specify information? It is so easy to look at some thing, and see a pattern in it. How much complex specified information can I detect in an ink blot, or in clouds? I can see patterns in biology, some that require a lot of words to describe, but many are easily explained by common descent or contingency.

Stanton · 3 January 2008

mark: I forget--how did Dembski claim to specify information? It is so easy to look at some thing, and see a pattern in it. How much complex specified information can I detect in an ink blot, or in clouds? I can see patterns in biology, some that require a lot of words to describe, but many are easily explained by common descent or contingency.

He alleged to have devised an "Explanatory Filter" with which to detect "Design." However, neither he, nor any of his followers or associates have ever actually theoretically or physically demonstrated exactly the "Explanatory Filter" works.

Pete Dunkelberg · 3 January 2008

Ye Filter.

Merlin Perkins · 3 January 2008

Dembski's No Free Lunch explains csi in a way that I could eventually understand. It was quite convincing and I think most of you have missed it.

On an unrelated issue: Dawkin's could not give an example of a mutation that increased information (by any definition). Is there an example of a mutation that improves the function of an enzyme or makes a new structure or is evolution in a positive direction?

Merlin

RBH · 3 January 2008

Mark asked

I forget–how did Dembski claim to specify information? It is so easy to look at some thing, and see a pattern in it.

That's essentially how Dembski does it. There is no principled (= mechanical) way to specify the pattern to which the object under analysis conforms. Is the pattern to which a bacterial flagellum conforms an outboard motor (as ID creationists typically characterize it), or a helicopter's rotor, or a flexible stick poking out of a blob, or an antenna whipping in the breeze, or what? Function also (occasionally?) enters into the identification of the (alleged) specification> But one is still in the definitional soup: Is the function of the flagellum appropriately described as to provide motility via a rotary motor and flexible rod (the usual ID creationist functional specification), or to provide motility, full stop, or to function so as to raise the probability that the organism will escape crowding, or will find food, or what? Again, there's no principled way to describe the function in Dembski's blathering about it.

Stanton · 3 January 2008

Merlin Perkins: Dembski's No Free Lunch explains csi in a way that I could eventually understand. It was quite convincing and I think most of you have missed it. On an unrelated issue: Dawkin's could not give an example of a mutation that increased information (by any definition). Is there an example of a mutation that improves the function of an enzyme or makes a new structure or is evolution in a positive direction? Merlin

Then, please demonstrate how to use "csi" to detect design in, say, the heteromorph ammonite Nipponites mirabilis Dawkins appeared to have been unable to give an example of a mutation that "increased information" because his interviewers were creationists who lied in order to interview him, and they edited the footage in order to make him look foolish. In fact, Dawkins was actually contemplating having the interviewers thrown out of his office. http://www.talkorigins.org/indexcc/CB/CB102_1.html If you actually knew how to do research, rather than unflinchingly swallow all of the lies creationists tell you, you would realize that there are countless research papers done on the positive identification of positive mutations, such as the three different versions of the enzyme nylonase in 2 different strains of Flavobacterium and 1 strain of Pseudomonas aeruginosa, the studies done on the appearance and evolution of the "antifreeze" gene in the Antarctic icefish of the suborder Notothenioidei, or even how heterozygous carriers of sickle cell anemia are capable of surviving virulent strains of malaria.

RBH · 3 January 2008

Merlin Perkins asked

On an unrelated issue: Dawkin’s could not give an example of a mutation that increased information (by any definition). Is there an example of a mutation that improves the function of an enzyme or makes a new structure or is evolution in a positive direction?

See the evolution of lactase (extending its functioning into adulthood in some populations) for an example of such an improvement in an enzyme. And see the Milano mutation for one that's particularly interesting to me, since I have coronary artery disease and have had an M.I. already. And quit blathering creationist bullshit.

Popper's Ghost · 3 January 2008

Dembski’s No Free Lunch explains csi in a way that I could eventually understand. It was quite convincing and I think most of you have missed it.

Why do ignorant people think that their being convinced of something is of any import?

Popper's Ghost · 3 January 2008

Why the negative base 2 logarithm of a probability?

See http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html

In other words, why not the positive base 3? Why not divide by 112 before taking the negative base 2 logarithm? Why not just add 42?

Because none of those operations would yield a measure of information in the sense of Shannon's theory.

Popper's Ghost · 3 January 2008

The idea that random stuff has lots of information is mistaken....Because what you sent me does not help me at all.

Perhaps mining your quote like this might indicate what is so very wrong about it. The world does not revolve around you. Information cannot be defined in terms of how it helps one specific recipient; Shannon information is an objective measure, and thus cannot be a matter of what a message tells you about horses or any other specific subject matter. The information measure pertains to the message itself as an abstract mathematical object. Since a random sequence is less redundant than a repetitive sequence, it has a higher information measure. Why is someone who is so ignorant of information theory that he can say "The idea that random stuff has lots of information is mistaken" writing articles about it for Reports of the NCSE?

PD · 3 January 2008

PG nails it.

* NO SEMANTIC CONTENT IN INFORMATION THEORY

* INFORMATION THEORY IS USELESS TO ID and
everything else he said on this too.

Surprisingly there are even a couple
articles on one of the IDist websites
saying the same thing. It makes me think
the author was trying to get through to
someone. It obviously did'nt work but
that's what happens when people have
concrete in their heads.

Muldoon · 3 January 2008

I have not been impressed with Dembski's work with regards to CSI. Behe's idea about IC seems more a plausible angle, if it could be positively demonstrated that there is no chemical pathways to certain structures. I ask the crowd here for your thoughts: if design is present in nature, is there any way to detect it, excepting for blatant messages to us in the DNA?

Raging Bee · 3 January 2008

Muldoon: You'd have to give us an exact definition fo "design" before anyone could objectively verify whether or not it is "detected" in nature. "It looks designed (to me), therefore it is designed" doesn't cut it, especially for an artsy type like me who sees "design" of sorts in ice crystals.

Mike Elzinga · 3 January 2008

Some of the confusion and difficulties associated with attempting to apply information theory to biology can be illustrated with a simple analogy to dendritic growth (e.g., icicles, stalactites, or other mineral or neuronal growth). Initially, before a specified branch of a dendrite has developed, the probability of its development would appear to be very low. However, this misconception arises because the branch is specified, and it ignores the contingencies that could just as easily have produced others.

In reality, millions of possibilities are available before a particular branch develops, and once a branch starts (because of contingencies existing in the system or its environment), the probabilities of its continuation may be quite large. This is just another way of saying that the probability of reaching certain states is dependent on what nearby states are available. Singling out a particular branch and asserting that its probability is low relative to the background states from which it developed (and therefore special in some way) is misleading because millions of other branches could just as easily have developed.

It may be meaningful in some way to calculate the probability (or, equivalently, the entropy or “information”) of that particular branch relative to nearby states, but that doesn’t take into account the millions of other branches that could have but didn’t develop. Given the right conditions and energy throughput, dendritic growth may be inevitable for a given system; it’s just the particular configuration that appears improbable if one ignores the broader picture.

Similarly, singling out particular features of an organism and suggesting that they are improbable (and therefore must be designed) ignores the record of evolutionary history. The fact that there are so many varied life forms that exist, and have existed, suggests that a large number of life forms are possible within the energy ranges and conditions on this planet. Just because other forms didn’t develop doesn’t mean that, given suitable contingencies, they couldn’t have developed. The variety of life on this planet appears to be in a constant state of flux. And the evolutionary tree does appear to be much like constantly changing dendritic growth. Species and their characteristics arise against a background of contingencies, and once they are established and selected, they form a template for further development along a particular branch (at least temporarily until further contingencies wipe out the branch).

When Dembski asserts something is improbable (and hides this assertion in a negative logarithm to base 2), he is arrogantly assuming he knows specifically how the current state of an organism was achieved and that there were no alternative organisms that could have developed. It’s like he gets dealt a 52-card hand from a shuffled deck of cards and concludes that the amount of information contained in his hand is minus log2(1/52!).

JGB · 3 January 2008

I think some of your terminology is muddled muldoon however I think your question if I am interpreting it correctly has some merit. The problem lies in proving their are no viable mutation pathways from point A to point B. This is not impossible just incredibly difficult to do in a truly rigorous sense. You have to account for an impractical amount of possible contingencies if your initial set of conditions fail to be viable as well as the wide variety of possible pathways. On the other hand in cases where researchers have tried to fill in potential intermediates they have found a fair amount of success suggesting that in practice we will find molecular intermediates given a reasonable amount of time and resources for a wide variety of systems. It gets much dicer if you want to talk about any gene because many of them are not nearly as well suited to experimental characterization.

Popper's Ghost · 3 January 2008

Behe’s idea about IC seems more a plausible angle, if it could be positively demonstrated that there is no chemical pathways to certain structures.

How are you going to positively demonstrate a universal negative? Behe tried to present a logical argument, but it's based on a ridiculous strawman version of evolution as a strictly additive process.

Henry J · 3 January 2008

Tex,

There is more information inherent in a random collection of molecules than a highly ordered one. Using water as an example, given the position of one molecule in an ice cube of a given size, you could very specify the position of all the other water molecules with a minimum amount of information that specified the parameters of the crystal structure and distance between molecules. Given the same amount of water in a steam vapor and the position of one molecule, it would require much more information to specify the position of any other water molecule. You could encode a much more complex message using the position of the water molecules in the vapor than in the ice crystal. Similarly, there is more total information in a random collection of six billion nucleotides in a beaker than there is in my genome. The only difference is that I have RNA polymerases and ribosomes to decode what little information remains. As I said, I have no background in information theory, so please let me know if I have missed something.

I'd think that to be relevant, the information has to be at least somewhat persistent. Positions of molecules in a vapor change constantly, so there's no persistence there. ----------- Muldoon,

if design is present in nature, is there any way to detect it,

Don't look for "design", look for side effects of the engineering methods used to implement it. Henry

Popper's Ghost · 3 January 2008

The problem lies in proving their are no viable mutation pathways from point A to point B.

No, there's no point A; the claim is that IC systems couldn't have evolved.

Shebardigan · 3 January 2008

And all this time I'd been saddled with the impression that the word "specified" within the expression "CSI" had been put in there by the usual suspects in order to indirectly introduce the concept of a "designer" at the ground floor of the whole gallimaufry.

After all, something can't be "specified" unless some specifier has engaged in specifying. The CSI in the bacterial genome that produces a flagellum is clearly exactly equal, functionally, to a CNC tape for a milling machine or a set of cards for a Jacquard loom. Both of those are complex, informative, and would have been specified by some engineer or designer.

Tex · 3 January 2008

Henry J.

I’d think that to be relevant, the information has to be at least somewhat persistent. Positions of molecules in a vapor change constantly, so there’s no persistence there.

This does not argue against my point that there is a wealth of information there, no matter how useful you may find it.

If, however, you insist on persistance, then just change my analogy to a pile of 1000 bricks stacked in a 10 X 10 X 10 array versus the same number of bricks randomly scattered over a lawn where they will persist in this arrangement until serious effort is expended to move them.

Popper's Ghost · 3 January 2008

If, however, you insist on persistance, then just change my analogy to a pile of 1000 bricks stacked in a 10 X 10 X 10 array versus the same number of bricks randomly scattered over a lawn where they will persist in this arrangement until serious effort is expended to move them.

Yes, it takes a lot less information to precisely specify the former configuration than the latter.

Joe Felsenstein · 4 January 2008

My, there are a lot of strong opinions here! Popper's Ghost said

Why is someone who is so ignorant of information theory that he can say “The idea that random stuff has lots of information is mistaken” writing articles about it for Reports of the NCSE?

Without getting into the subject of how much of an ignoramus I might be, let me just explain why I might take a different view as to whether it is reasonable to use the phrase Specified Information. If someone says that our genome contains a lot of information making us well-adapted, is that an unreasonable thing to say? On the view of Popper's Ghost, it would be unreasonable -- for that genome would contain, in PG's view, (almost) exactly the same information content as a random string of nucleotides of the same length. Our genomes undoubtedly have evolved so that a genotype is in a very small fraction of all possible sequences, one which is in the far upper tail of the distribution of fitnesses. We can legitimately discuss how much information I have to send over a channel to enable you to choose a genome that is that far into that tail of the distribution out of a mix of all possible genotypes. And it is much less than implied by the length of the genome. Dembski is quite vague about how his Specification works, and has been appropriately criticized for that. But I think that if we concentrate on one measure -- fitness -- we can focus more clearly on his argument. He is arguing, in effect, that natural selection cannot get us into that upper tail of the fitness distribution, cannot build a lot of Whatchamacallit into the genome. As the genome contains lots of this Whatchamacallit, he then argues that this makes the case for a Designer. He is wrong about that, but because his argument that natural selection can't do the job is wrong. As it is legitimate to discuss how much Whatchamacallit a genome has, that part of his argument is not the weak part, as it can be fixed. The natural-selection-can't-do-it part is the weak part. Some would say he shouldn't use the word "information" but I would argue that it's OK to. Your mileage may vary. Jeff Shallit pointed out Dembski's identification of his CSI with Kolmogorov complexity, and I agree he got this backwards and that his use of the Kolmogorov framework is a non-starter. It is a diversion from this other argument. I suspect we could go back and forth forever on this. :-(

Jeffrey Shallit · 4 January 2008

Joe, the term "information" has precise meaning for mathematicians and computer scientists under the rubric of Shannon or Kolmogorov information. If you want to call the "Whatchamacallit" in the genome "information", you are going to confuse the rest of us who already have a different definition, established for many years, in mind. Why not give it a another name entirely?

Shebardigan · 4 January 2008

Jeffrey Shallit: Joe, the term "information" has precise meaning for mathematicians and computer scientists under the rubric of Shannon or Kolmogorov information. If you want to call the "Whatchamacallit" in the genome "information", you are going to confuse the rest of us who already have a different definition, established for many years, in mind. Why not give it a another name entirely?

Specifically, let's not cooperate with the (perhaps deliberate) conflation of at least two definitions of "information" that underlies the Dembski & Co sciency-sounding horsefeathers.

Mike Elzinga · 4 January 2008

If someone says that our genome contains a lot of information making us well-adapted, is that an unreasonable thing to say?

Joe, It may in fact be misleading to say this. “Well-adapted” doesn’t mean that “information” was placed in the coding of an organism to make it fit with its current environment. Putting it this way is similar to suggesting that the shape of a puddle of water was “well-adapted” to the hole in which it found itself. Organisms develop and are selected by contingencies within the environment. Thousands of arrangements may be possible, (in fact there is usually a distribution of traits within a reproducing population of organisms), but they get sorted, with some being favored more than others. Change either the environment or the distribution of traits, and something else falls out. Singling out the thing that falls out as somehow being special is at the root of the confusion. It ignores the history of evolution which shows that an enormous range of possibilities exists and that some of these persist for a time because they are just robust enough to survive in their current environments. It’s somewhat like the difference between saying that “Mrs. Snodgrass at a specified address will win the lottery” and “Mrs. Snodgrass at a specified address” has won the lottery”.

Our genomes undoubtedly have evolved so that a genotype is in a very small fraction of all possible sequences, one which is in the far upper tail of the distribution of fitnesses.

The genome is analogous to a template once something falls out. It provides a substrate for the distribution of traits on which selection will take place. If we look at a simpler (but, actually, not much more simple) example of crystal growth, we get the picture. Once a seed has laid down a template for further crystal growth, the subsequent development of the crystal will follow the patterns of the underlying atomic structure of the seed. If, however, defects (mutations) are introduced by environmental contingencies (radiation induced defects, turbulence, physical forces, etc.), the path of subsequent crystal development is changed. Calling this “information” is somewhat misleading unless one is careful to state clearly what is meant. One can calculate the probabilities (or entropy, or “information”) of a particular configuration relative to an underlying initial state, but if you don’t allow for the possibility that this particular configuration is but one of perhaps millions of possible configurations, your calculation is misleading. The fact that living organisms are “fragile” relative to the environment is probably part of the reason that the ones that survive are mistakenly seen to be special. But, in fact, many other organisms have been and are possible. The Creationists Henry Morris and Duane Gish deliberately introduced the confusions about probabilities, and this confusion has been extended (and relabeled as “information”) by Dembski, Behe, et. al. into the realm of microbiology. They have to do this to pacify their constituency on sectarian matters of biblical inerrancy because humans are the goal of creation in their world view. As far as we know, evolution has no particular goal, hence, “information” about a particular outcome gets misused without proper care about what probabilities are being calculated and recast into this form.

Joe Felsenstein · 4 January 2008

Jeffrey Shallit: Joe, the term "information" has precise meaning for mathematicians and computer scientists under the rubric of Shannon or Kolmogorov information. If you want to call the "Whatchamacallit" in the genome "information", you are going to confuse the rest of us who already have a different definition, established for many years, in mind. Why not give it a another name entirely?

It's going to be hard to find a different one, because "information" is also a word used in colloquial English as well as in the languages of Mathematics and of Computer Science. If I need to have some idea which of 8 horses will win a race, and you are the inside tipster and give me the word that the race is fixed and horse number 3 will definitely win, and it does, maybe I should say "thanks for the Whatchamacallit, it was Totally Tubular!" Or maybe ...

Eric Finn · 4 January 2008

Mike Elzinga, I think you have (once again) made a good comment.

Mike Elzinga: Organisms develop and are selected by contingencies within the environment. Thousands of arrangements may be possible, (in fact there is usually a distribution of traits within a reproducing population of organisms), but they get sorted, with some being favored more than others. Change either the environment or the distribution of traits, and something else falls out. Singling out the thing that falls out as somehow being special is at the root of the confusion. It ignores the history of evolution which shows that an enormous range of possibilities exists and that some of these persist for a time because they are just robust enough to survive in their current environments. [Emphasis added]

Information theory (using the concepts from Shannon or from Kolmogorov), is a powerful tool to study a broad class of phenomena. However, it is important to understand, or at least define the basic concepts one wishes to study in a complicated system. Otherwise the interpretation of the results can lead to erroneous conclusions. Regards Eric

Carl Bergstrom · 4 January 2008

Hello everyone, I've enjoyed seeing the enthusiasm for this subject, which I believe to be both a fascinating and important question within evolutionary biology. But I'm a bit troubled that some are so quick to assume ignorance on the part of others. For example, "Popper's Ghost" wrote about Joe Felsenstein

Why is someone who is so ignorant of information theory that he can say "The idea that random stuff has lots of information is mistaken" writing articles about it for Reports of the NCSE?

Let me suggest four possible reasons: 1) Up until the sentence quoted, Joe is writing about Specified Information, not Shannon information. His claim is correct about specified information, and about any number of other senses of the word information. The claim would be incorrect only if Joe had written "Shannon information" or comparable -- and he did not. 2) The NCSE is happy to have a paper from an author who receives over 2000 citations a year for his work in population genetics and who wrote the most cited paper I can find anywhere on Google scholar. 3) Joe has been publishing solid work on the relationship between information theory and evolutionary biology since before many of the posters to this thread were born -- and thus may know more about information theory than some here seem to have surmised. 4) Joe is aware of, and perhaps alluding to, our recent work on the relation between information and natural selection, which suggests surprising connections between information and Darwinian fitness. This starts to bring relate semantic aspects of meaning to information-theoretic quantities, and suggests that perhaps fitness may even be viewed as a type of information measure. Perhaps others can suggest additional possibilities. Best regards, Carl Bergstrom

Joe Felsenstein · 4 January 2008

Thanks, Carl. That ought to do a good job of covering up the fact that I am an ignoramus. Check's in the mail.

Popper's Ghost · 4 January 2008

Without getting into the subject of how much of an ignoramus I might be, let me just explain why I might take a different view as to whether it is reasonable to use the phrase Specified Information.

I said nothing about that; what I said was that your claim that "The idea that random stuff has lots of information is mistaken" is factually wrong. Calling it a "strong opinion" is like saying that "IC systems can evolve" is a "strong opinion". In addition to being that, it's a fact.

Popper's Ghost · 4 January 2008

On the view of Popper’s Ghost, it would be unreasonable – for that genome would contain, in PG’s view, (almost) exactly the same information content as a random string of nucleotides of the same length.

No, that's totally wrong, this is not my view, and exhibits considerable confusion, and your confusion has led others to share in your confusion and wrongly reject the information view of the genome. If the genome is taken to be a message about the environment (actually the history of environments of the organism's predecessors), so must a random string of nucleotides, as it's in the same language. But what environment is a random string of nucleotides a message about? Not ours. Rather, if one were to decode that string, one would end up with an unpredictable environment incapable of producing viable organisms -- it's unpredictability requires that a much larger amount of information would be required to encode it. That we don't find organisms with random strings of nucleotides is precisely because our genome reflects the predictable environment in which we live, but that certainly doesn't mean that a random sequence would have less information; random sequences aren't relevant, and talking about them sows confusion. Get yourself an education in the mathematics of information theory, or you'll screw up the details and educated people won't be able to appreciate your valid points.

Popper's Ghost · 4 January 2008

Up until the sentence quoted, Joe is writing about Specified Information, not Shannon information. His claim is correct about specified information, and about any number of other senses of the word information. The claim would be incorrect only if Joe had written “Shannon information” or comparable – and he did not.

No, I'm afraid you're wrong here, because Joe responded to what someone else wrote, and so he's talking about what that person was talking about. He "corrected" Tex's claim that "There is more information inherent in a random collection of molecules than a highly ordered one", but Tex was right and Joe is wrong. And anyone who knows anything about information theory must know that Tex was right, for the standard mathematical concept of information. To say Tex was "mistaken" is ignorant.

Joe has been publishing solid work on the relationship between information theory and evolutionary biology since before many of the posters to this thread were born – and thus may know more about information theory than some here seem to have surmised.

I don't care what he's published, he's still wrong about fundamental facts of information theory. That doesn't necessarily mean that he's wrong about the genome being reasonably viewed as a message about the history of the organism, but his work will unfortunately be discounted by people familiar with information theory if he doesn't get the fundamentals right.

Henry J · 4 January 2008

It’s going to be hard to find a different one, because “information” is also a word used in colloquial English as well as in the languages of Mathematics and of Computer Science.

Yeah, colloquially "information" means data that's useful to somebody. A subjective judgment if ever there was one. Henry

Popper's Ghost · 4 January 2008

Yeah, colloquially “information” means data that’s useful to somebody. A subjective judgment if ever there was one.

Exactly; it's a useless concept for science. I tried to make this point about the horse race. A random sequence doesn't contain less information just because it isn't useful to the person betting on the horses. It might be very useful to the spy who is only pretending to bet on the horses but has actually received a message made from a one-time pad. In the colloquial sense, people wouldn't say that "reecu qasea flk4x" contains less information than "I have a banana"; such a comparison is pointless. Rather, they would say the former message is malformed. In the same way, we might say that a random sequence of nucleotides is malformed -- it cannot produce an organism and cannot be a product of evolution. Talking about whether it has more or less information than some other sequence of nucleotides is talking nonsense unless we're talking about Shannon information.

Popper's Ghost · 4 January 2008

Thanks, Carl. That ought to do a good job of covering up the fact that I am an ignoramus. Check’s in the mail.

It can't be covered up with his own ignorance and appeals to authority. The claim that “The idea that random stuff has lots of information is mistaken” is not correct for any formally definable notion of information, whether you call it "specified information" or anything else.

Popper's Ghost · 4 January 2008

For anyone who thinks that "specified information" might be a useful concept, distinct from Shannon information, see http://www-lmmb.ncifcrf.gov/~toms/paper/ev/dembski/specified.complexity.html

What is "Specified"? This seems to be that the 'event' has a specific pattern. In the book he runs around with a lot of vague definitions. "Specification depends on the knowledge of subjects. Is specification therefore subjective? Yes." Page 66 That means that if only if Dembski says something is specified, it is. That's pretty useless of course....

What is "Information"? Dembski mentions Shannon uncertainty (p. 131) without objection as the average of the 'surprisal'. He accepts my use of the Rsequence measure too (p. 212-218). So I will take information to be the Shannon information. ...

Popper's Ghost · 4 January 2008

Joe is aware of, and perhaps alluding to, our recent work on the relation between information and natural selection, which suggests surprising connections between information and Darwinian fitness. This starts to bring relate semantic aspects of meaning to information-theoretic quantities, and suggests that perhaps fitness may even be viewed as a type of information measure.

I think this is a fruitful approach and have argued something along these lines myself (note my comment above about seeing the genome as a message about the history of environments of the organism), but it is undermined by ignorant statements like "The idea that random stuff has lots of information is mistaken". "random stuff" isn't relevant in this context because natural selection doesn't produce random organisms, random genomes, or random anything else. It's the very non-randomness of the result that points to an algorithmic selection process.

Eric Finn · 5 January 2008

Popper's Ghost, please excuse my ignorance.

Popper's Ghost: The claim that “The idea that random stuff has lots of information is mistaken” is not correct for any formally definable notion of information, whether you call it "specified information" or anything else.

According to Kolmogorov's complexity criterion, random stuff is highly complex. Kolmogorov complexity is sometimes called "algorithmic information". In this sense, random stuff does, indeed, have lots of information. If we apply Shannon's criteria to random stuff, the result seems to be just the opposite. Random stuff is just noise without any information contents. I am aware that Shannon originally discussed information transmitted over a transmission line. Basically, the same equations appear in context with entropy. It has been suggested that information is negative entropy (negentropy). My question is: When should we use Kolmogorov and when Shannon? I am aware that simple questions may be hard (or tedious) to answer. If this is the case here, I will not push for a complete answer, just the direction. I will work my way therefrom the best I can. Regards Eric

Popper's Ghost · 5 January 2008

If we apply Shannon’s criteria to random stuff, the result seems to be just the opposite. Random stuff is just noise without any information contents.

By "information contents", what you really mean is signal, as in "signal-to-noise ratio". Shannon's concept of information is an objective measure which doesn't distinguish between contentful (signal) and non-contentful (noise). And "noise" and "random" are in no way synonyms; the noise can be highly ordered -- think punk rock or any other kind of music you don't like :-), an interfering radio station, etc. Note that noise introduces uncertainty, and information reduces uncertainty. Thus the more noise, the more information must be transmitted over the channel. Channels without noise don't need parity bits or other redundancies. The notion of randomness relates (via algorithmic information theory) to compressibility. More compressible sequences require fewer bits; random sequences, being uncompressible, require the maximum number of bits. Bits are Shannon's measure of information. See http://www.stanford.edu/class/ee104/shannonpaper.pdf

Following Nyquist and Hartley, it is convenient to use a logarithmic measure of information.... We will use the base 2 and call the resulting units binary digits or bits.

Jeffrey Shallit · 5 January 2008

Joe:

You say "It’s going to be hard to find a different one, because “information” is also a word used in colloquial English as well as in the languages of Mathematics and of Computer Science. "

Sorry, I find this objection silly. The words "field" and "group" also have meanings in colloquial English, but if you talk about "elements in a group" or "field extensions", no one is going to think you are talking about their colloquial meanings.

If you want to give a mathematical definition of information that differs from the standard, call it something else.

Carl Bergstrom:

Dembski's "specified information" is completely bogus. Read my long paper with Elsberry.

Eric Finn · 5 January 2008

Popper's Ghost, thank you for your reply. You addressed the items that confused me.

Popper's Ghost: By "information contents", what you really mean is signal, as in "signal-to-noise ratio". Shannon's concept of information is an objective measure which doesn't distinguish between contentful (signal) and non-contentful (noise).

Yes, I agree. Does this mean that Shannon´s concept is applicable only, when there is a transmitter (of a message) and a receiver to interpret that message? Then, Shannon´s equation would not be useful in estimating "information content" of a system that we do not expect to contain a message?

And "noise" and "random" are in no way synonyms; the noise can be highly ordered -- think punk rock or any other kind of music you don't like :-), an interfering radio station, etc.

We are now discussing "noise" in its colloquial meaning. I have equated (maybe incorrectly) noise with randomness, (white) noise being a pure example of randomness.

Note that noise introduces uncertainty, and information reduces uncertainty. Thus the more noise, the more information must be transmitted over the channel. Channels without noise don't need parity bits or other redundancies.

Understood (I hope).

The notion of randomness relates (via algorithmic information theory) to compressibility. More compressible sequences require fewer bits; random sequences, being uncompressible, require the maximum number of bits. Bits are Shannon's measure of information. See http://www.stanford.edu/class/ee104/shannonpaper.pdf

I think this is essential. How do we apply the compressibility of a given data set to understand complicated systems (such as biological systems). I am very grateful you provided the link (shannonpaper). Regards Eric

Torbjörn Larsson, OM · 5 January 2008

Thanks PvM, that was an interesting article.

If I may review it from my admittedly dim understanding (not using any of the two common information theories in my work), Godfrey-Smith has extensively covered "information" as used in biology. He is correctly noting the obvious problem of accepting Shannon's description of a physical observable, yet speculate in formulations that mistakes information for a physical object.

This philosophical description is however not how a scientist would have described a scientific area. We have two different models of information (Shannon theory and algorithmic information theory) that would need to be covered. Godfrey-Smith doesn't mention the later, even when discussing coding issues. Those two models have already limited applications in physics, which would warrant a description in such a review, and it would be surprising if biology would find information a better tool.

Finally, I think Godfrey-Smith may have less of a grip on signal theory. In its purest form signal theory isn't about what information (correlations) a signal codes of a source system, but how to detect and decode a predetermined signal among noise. This can even be done in a model-less fashion.

Source system identification (modelling), whether as a black box or as a physically informed model, is the domain of systems and control theory. I think that may be a point of the Developmental Systems Theory Godfrey-Smith mentions, but you wouldn't recognize it from his description. I have no idea of how successful systems identification can be of biological systems, with their likely distributed, changing (during growth, repair, sickness, et cetera), and contingent components. It is hard enough to reverse-engineer when you have systems with reasonably static and localizable but unobservable internal nodes, say IC circuits.

I think the later type of review would have prepared the reader for an informed discussion along the lines of Shallitt, Elzinga and PG. Each has obviously done some thinking about the applicability of information in biology, and I'm enjoyed about the opportunity to learn more.

Torbjörn Larsson, OM · 5 January 2008

But what environment is a random string of nucleotides a message about? Not ours. Rather, if one were to decode that string, one would end up with an unpredictable environment incapable of producing viable organisms – it’s unpredictability requires that a much larger amount of information would be required to encode it.

Admirably clearly put. It can't be the stochastic distribution as such that necessitates an algorithmic encoding (because a very short string would describe say a gaussian), but the unpredictability of each datum. I may be mistaken, but I believe that the semantic coding concepts that shows up in some physics and here biological circles are connected to systems identification (i.e. what does the correlations means in terms of the senders system state). If so they represent empirical knowledge (model or theory) rather than observable information (correlations or uncertainty). I wouldn't like to conflate two such clearly separated concepts, with their different scopes and methods.

... the whole concept of “specified information” is nonsense. A ‘specification’ means the object conforms to a pattern, i.e., is easy to describe. But ‘information’ in the sense it is understood by mathematicians and computer scientists measures to what extent an object is hard to describe; that it, how much it does not conform to a pattern.

The very point made by computer scientist Mark Chu-Carroll here, I believe:

This is so very wrong that it demonstrates a total lack of comprehension of what K-C theory is about, how it measures information, or what it says about anything. No one who actually understands K-C theory would ever make a statement like Dembski's quote above. No one. But to make matters worse - this statement explicitly invalidates the entire concept of specified complexity. What this statement means - what it explicitly says if you understand the math - is that specification is the opposite of complexity. Anything which posesses the property of specification by definition does not posess the property of complexity. In information-theory terms, complexity is non-compressibility. But according to Dembski, in IT terms, specification is compressibility. Something that possesses "specified complexity" is therefore something which is simultaneously compressible and non-compressible. [Emphasis mine.]

Mark explains later to the slow comprehender [me] that the problem isn't a search for an unspecified balance between compressibility ("specified") and non-compressibility (complexity) but that the concept is a self-conflicting search for two different optima (simplicity and complexity) simultaneously among one single observable. It is simply not welldefined.

Torbjörn Larsson, OM · 5 January 2008

Dawkin’s could not give an example of a mutation that increased information (by any definition).

Huh? Dawkins has a very elaborate understanding of what information means as regards biology:

But it still remains true that natural selection is a narrowing down from an initially wider field of possibilities, including mostly unsuccessful ones, to a narrower field of successful ones. This is analogous to the definition of information with which we began: information is what enables the narrowing down from prior uncertainty (the initial range of possibilities) to later certainty (the "successful" choice among the prior probabilities). According to this analogy, natural selection is by definition a process whereby information is fed into the gene pool of the next generation. If natural selection feeds information into gene pools, what is the information about? It is about how to survive. Strictly it is about how to survive and reproduce, in the conditions that prevailed when previous generations were alive. To the extent that present day conditions are different from ancestral conditions, the ancestral genetic advice will be wrong. In extreme cases, the species may then go extinct. To the extent that conditions for the present generation are not too different from conditions for past generations, the information fed into present-day genomes from past generations is helpful information. Information from the ancestral past can be seen as a manual for surviving in the present: a family bible of ancestral "advice" on how to survive today. We need only a little poetic licence to say that the information fed into modern genomes by natural selection is actually information about ancient environments in which ancestors survived.

If Dawkins doesn't give an example of a mutation that increases such Shannon information as he defined it above, it is because mutations clearly doesn't do this in such a model:

Random genetic error (mutation), sexual recombination and migratory mixing, all provide a wide field of genetic variation: the available alternatives. Mutation is not an increase in true information content, rather the reverse, for mutation, in the Shannon analogy, contributes to increasing the prior uncertainty. But now we come to natural selection, which reduces the "prior uncertainty" and therefore, in Shannon's sense, contributes information to the gene pool. In every generation, natural selection removes the less successful genes from the gene pool, so the remaining gene pool is a narrower subset. The narrowing is nonrandom, in the direction of improvement, where improvement is defined, in the Darwinian way, as improvement in fitness to survive and reproduce.

Or as Gary Hurd put it:

“If evolution is a car, then reproduction is the engine, mutation is the fuel, and natural selection provides the steering wheel.”

How would a fuel (increase in uncertainty) provide steering (decrease in uncertainty)? To say that mutations creates information is akin to the mistake a creationist does when he claims against available knowledge that adaptation happen by the mechanisms of mutation alone, instead of say heritable variation with selection. And who would be so stupid and uninformed? Apparently not Dawkins.

Torbjörn Larsson, OM · 5 January 2008

I failed to emhasize Dawkin's description of what information means in such a Shannon analogy that he comtemplated. (Which btw PG here also describes.) I think it so interesting that it deserves to be so treated, even if I have mercilessly commented at length already:

If natural selection feeds information into gene pools, what is the information about? It is about how to survive. Strictly it is about how to survive and reproduce, in the conditions that prevailed when previous generations were alive. To the extent that present day conditions are different from ancestral conditions, the ancestral genetic advice will be wrong. In extreme cases, the species may then go extinct. To the extent that conditions for the present generation are not too different from conditions for past generations, the information fed into present-day genomes from past generations is helpful information.

Merlin · 5 January 2008

Does motivation of the questioner somehow make the question not legitimate?

Dawkins did not answer the question because he could not. And when he tried, he answered a different question. Various of you have at least tried to answer my question. Good.

Intuitively, (what one relies on when one is ignorant), one knows specification when he see it. (Wasn't it Dawkins who acknowledged this when he told his students that what they were seeing is only the appearance of design?) And intuitively one know what information is, and intuitively, YOU know that you cannot make random changes to any highly complex, functioning system and expect improvement. Would you go into an automated factory, make a few random changes and go to the shipping dock to await the new products? If you do, take a book. You'll be there a while.

Now, why should I doubt my intuition? If there were good examples of where it was wrong, I would begin to doubt. (The lactase example was not convincing. The Milano one was better.) Or if there were a good model of how NS could weed out all the bad mutations and keep the ones that increased information/improved function/created new structures, then you might be on to something. But along with what might be improved function/increased information there is clearly a lot of the opposite. Five thousand plus know genetic diseases? Wouldn't any increased information/improved function be swamped by the garbage?

Merlin

Joe Felsenstein · 5 January 2008

Popper's Ghost:

The claim that “The idea that random stuff has lots of information is mistaken” is not correct for any formally definable notion of information, whether you call it “specified information” or anything else.

That we don’t find organisms with random strings of nucleotides is precisely because our genome reflects the predictable environment in which we live, but that certainly doesn’t mean that a random sequence would have less information; random sequences aren’t relevant, and talking about them sows confusion.

If the genome is taken to be a signal about something else (say the environment) then a random signal does not reduce our uncertainty about the environment, and thus the joint information is zero, as I(X;Y) = H(Y) - H(Y|X) = 0. In the original post to which Popper's Ghost was objecting, I was reacting to a statement that said that

There is more information inherent in a random collection of molecules than a highly ordered one.

I started talking about specified information, then said that

The idea that random stuff has lots of information is mistaken. It has lots of things that need explaining but that is different from saying that it has a lot of specified information. It has none.

The poster I was reacting to (Tex) wasn't necessarily talking about specified information, or about joint information. I was. A long random signal could potentially convey lots of information (even though it doesn't have any joint information about anything) and so, yes, formally it contains lots of information. So, yes, Popper's Ghost, my statement about Tex's posting was mistaken and yes, in that respect I was wrong. Crow away. However ... When people say that natural selection builds lots of information into our genome, they must be talking about something other than Shannon information. They are implicitly talking about information that conveys joint information about something (such as what the organism's phenotype should be to be highly fit). That is what Dembski was trying to get at with his advocacy of specified information, and what Orgel was getting at when he originated talking about specification. Carl Bergstrom and I are almost alone among posters here in thinking that the idea of specified information is not useless or mistaken. The articles that have been cited here as proving that SI is wrong are ones that point out (correctly) that Dembski did not have a clear definition of what the genome was supposed to be conveying information about. (And they also pointed out that he brought in Kolmogorov information in a backwards way that doesn't help). But underlying Dembski's argument is his belief that the adaptive information contained in our genome cannot have been put there by natural selection. He is wrong about that, it can be put there by selection. If we fix Dembski's notion by saying that the specification is high fitness, his SI is relevant and not wrong -- but his argument that natural selection cannot achieve it is still totally wrong. In the horse race example, let's imagine a crooked race track where they fix every race. Eight horses run in each race, and they choose one at random each time and fix the race so it wins. You have an informant inside their operation and the informant sends a signal each time saying (correctly) who is going to win. That is joint information about the outcome, and conveys 3 bits worth of it each time. If the informant instead sent the complete encoded contents of a mediocre Victorian novel, that would technically be lots of information. It might be very useful to a student of literature. But it would convey no joint information about the outcome of the horse race. So given the definition of what we care about (the horse race), SI is definable and works, and a long random signal has zero of that.

Mike Elzinga · 5 January 2008

When people say that natural selection builds lots of information into our genome, they must be talking about something other than Shannon information. They are implicitly talking about information that conveys joint information about something (such as what the organism’s phenotype should be to be highly fit). That is what Dembski was trying to get at with his advocacy of specified information, and what Orgel was getting at when he originated talking about specification.

If you want to say that natural selection builds lots of “information” into a genome, you would have to be talking about the contingent history of the organism, as Popper’s Ghost suggests. Survival of a pattern at any point depends on the regularity of the laws of physics, but contingencies do the switching among allowed possibilities. Ultimately you would have to find a way do dig out that “information” about the many possible contingent histories that could lead to the same current state. The genome at any point in its history forms a complicated “template” with many degrees of freedom for the next generations to build on. Switching among these degrees of freedom (within the constraints of quantum mechanics and the emergent properties of a complicated collection of molecules) is contingent on what exists in the environment plus what distribution of characteristics an organism has at the moment. But even if you follow a particular line to a given species, in what sense do you have any more “information” than if you followed another line to another possible species? Each has its own contingent history, and for the most part, it is a random selection among the many degrees of freedom contained within the current (and evolved) genome at any given time. So it is not clear what point you are making when you suggest that the genome contains “information” about fitness. What do you know that you don’t already know from the process of natural selection on an evolving substrate of patterns that ultimately go back to the quantum mechanical interactions among atoms and molecules? These lower level interactions have implications for emerging interactions as systems get more complicated, and therefore the number of degrees of freedom for further development increases as systems evolve. “Fitness” is just what falls out and is relatively stable within the current environment, and a range of things could fall out and survive. Going back to a point I made earlier, attaching some special significance to what has been contingently selected from among many possibilities seems to be at the heart of the confusion about “information” telling us about fitness. Would you say that the patterns in Saturn’s rings or in the red spot on Jupiter contain “information” about fitness of these patterns (as opposed to other just as likely patterns) to survive? How about a bunch of icicles hanging from the edge of a roof or stalactites hanging from the roof of a cave? Once they are there, is there anything special about any one of them that would lead you to suggest that it had some kind of pattern within it that made it more fit for survival than any other possibility that could have occurred but didn’t? You might be able to say that further development along a particular icicle or stalactite is more probable, but that seems like a trivial piece of information that doesn’t add anything to the picture. Alternatively, attempting to collect enough “information” from a current configuration in order to make a statement about how hard it would be to reconstruct it ab initio fails to take into consideration the way in which it was historically constructed. It goes back to Dembski’s and Behe’s errors in assuming something is so complex that it couldn’t have evolved. So, it is not clear (to me at least) what you are aiming at in your attempts to apply “information theory” to biological systems.

Popper's Ghost · 5 January 2008

Carl Bergstrom and I are almost alone among posters here in thinking that the idea of specified information is not useless or mistaken. The articles that have been cited here as proving that SI is wrong are ones that point out (correctly) that Dembski did not have a clear definition of what the genome was supposed to be conveying information about. (And they also pointed out that he brought in Kolmogorov information in a backwards way that doesn’t help). But underlying Dembski’s argument is his belief that the adaptive information contained in our genome cannot have been put there by natural selection. He is wrong about that, it can be put there by selection.

Sigh. What part of "I think this is a fruitful approach and have argued something along these lines myself (note my comment above about seeing the genome as a message about the history of environments of the organism)" don't you understand? Read Torbjörn's posts as well. I'll say it again: "Get yourself an education in the mathematics of information theory, or you’ll screw up the details and educated people won’t be able to appreciate your valid points....That doesn’t necessarily mean that [you are] wrong about the genome being reasonably viewed as a message about the history of the organism, but [your] work will unfortunately be discounted by people familiar with information theory if [you] doesn’t get the fundamentals right."

So, yes, Popper’s Ghost, my statement about Tex’s posting was mistaken and yes, in that respect I was wrong. Crow away.

Listen, bozo, that's all I said. Yet you continued to deny it and Bergstrom added his sophistic defense of your error. Grow up.

Popper's Ghost · 5 January 2008

It’s going to be hard to find a different one, because “information” is also a word used in colloquial English as well as in the languages of Mathematics and of Computer Science.

"information", as used in "the languages of Mathematics and Computer Science", is Shannon information.

If I need to have some idea which of 8 horses will win a race, and you are the inside tipster and give me the word that the race is fixed and horse number 3 will definitely win, and it does, maybe I should say “thanks for the Whatchamacallit, it was Totally Tubular!” Or maybe …

Well, you could call it a tip. But the tip could have been wrong, in which case it would have been "misinformation", or "disinformation", in the vernacular. The tip might even be wrong despite horse number 3 winning; perhaps the race wasn't fixed, and the tipster pulled a lucky guess out of his ass. The colloquial as well as technical term for that is "bullshit". The bottom line is that your subjective notion of "information" as something "useful", which you here tie to correctness, breaks down under scrutiny. There is a word that you can use, the word I used in my posts above: The tipster gave you a message. The message is in a language, and it refers to a specific horse, a specific race, and so on -- there's rich theory about language as an abstraction. This interpretation of a sequence as a message is independent of how subjectively useful you find it or whether it is veridical. One can, I think, model the genome as a message that refers to the environments of the predecessors of the organism. I don't think that Mike Elzinga's objection is relevant; a message about the winner of a horse race could be interpreted as being about many other horse races (it can also be interpreted as being in a different language, about a different subject altogether), but that's neither nor there; we only care about the relationship between the message and history as it actually played out.

Popper's Ghost · 5 January 2008

P.S. note that interpreting a sequence as a message in a specific language gives you everything you (should) want from that word "specified". Looking at the Bergstrom & Lachmann paper, I don't see the word "specified" anywhere (in fact the string "specif" does not occur). I do note that they provide this definition:

Definition: The value of information associated with a cue or signal C is defined as the difference between the maximum expected payoff or fitness that a decision-maker can obtain by conditioning on C and the maximum expected payoff that could be obtained without conditioning on C.

If they make the distinction, why doesn't Joe? Surely it's the value of the horse tip that matters. Also, I wonder why this definition can't be given more simply as

Definition: The value of a signal S is defined as the difference between the maximum expected payoff or fitness that a decision-maker can obtain by conditioning on S and the maximum expected payoff that could be obtained without conditioning on S.

PvM · 5 January 2008

When people say that natural selection builds lots of information into our genome, they must be talking about something other than Shannon information.
— Joe

Not really, all we are saying is that there exists a correlation between the environment and the genetic coding. See the work by Adami or Schneider.

Mike Elzinga · 5 January 2008

If all one is trying to do with the idea of “information” is to make some quantitative assessment about survivability or fitness in a given environment, then I don’t see anything that is either profound or necessary in the idea, or that isn’t already available in other, more obvious kinds of calculation such as simple probability or change-of-entropy calculations.

Physical systems are full of examples in which improbable states are more easily reached by way of nearby states (crystal growth, catalysis and catalytic substrates, optical pumping, bootstrap processes, pressure changes, temperature changes, etc.). It is then possible, at least in principle, to calculate the probabilities of reaching and surviving in various nearby states from the current state (where the current state itself may be a result of the system’s history). Presumably, if one could nail down the probabilities of reaching successive stages in an evolving biological system, one could model theoretical evolutionary systems far out into their future by bootstrapping off successive states of development

But one can’t go back too far in the history of the system in order to do such a calculation on a late state because earlier states may not lead to many (or any) of the later states. That has to be a matter of system history. I think Torbjörn’s quote of Dawkins (comment #139317) is another way of stating these ideas in the case of gene pools.

Popper's Ghost · 5 January 2008

BTW, they themselves slip into the language of my alternative definition, e.g., "the fitness value of an informative cue". Of course "informative" is redundant; a cue is "informative" precisely to the degree that it has value (as defined).

Popper's Ghost · 5 January 2008

So given the definition of what we care about (the horse race), SI is definable and works

What is this working definition of "specified information"? How does it differ from Bergstrom and Lachmann's definition of "value of information"?

Popper's Ghost · 5 January 2008

P.S. how do you define a horse race? Perhaps you meant "specification of what we care about" ... how do you specify it, in a way that "works"?

Popper's Ghost · 5 January 2008

I failed to emhasize Dawkin’s description of what information means in such a Shannon analogy that he comtemplated. (Which btw PG here also describes.)

Dawkins gets at the critical point that I omitted when I said that the genome is a message about the environments of the organism's predecessors. More specifically, it's a message that says "the organism's predecessors survived to the point of reproduction by generating these proteins and regulating development like so". It's not a message to the organism, as the organism simply represents the message. Rather, it's a message to us; that is, we can interpret the genome of an organism as telling us about the environments the organism's predecessors faced and what responses were found by NS+RM that allowed reproduction down the line.

Popper's Ghost · 5 January 2008

Not really, all we are saying is that there exists a correlation between the environment and the genetic coding. See the work by Adami or Schneider.

e.g., http://www.ccrnp.ncifcrf.gov/~toms/paper/ev/ and http://www.talkorigins.org/indexcc/CB/CB102.html From the latter:

According to Shannon-Weaver information theory, random noise maximizes information. This is not just playing word games. The random variation that mutations add to populations is the variation on which selection acts. Mutation alone will not cause adaptive evolution, but by eliminating nonadaptive variation, natural selection communicates information about the environment to the organism so that the organism becomes better adapted to it. Natural selection is the process by which information about the environment is transferred to an organism's genome and thus to the organism (Adami et al. 2000).

Popper's Ghost · 5 January 2008

Does this mean that Shannon´s concept is applicable only, when there is a transmitter (of a message) and a receiver to interpret that message? Then, Shannon´s equation would not be useful in estimating “information content” of a system that we do not expect to contain a message?

The concept applies whenever a system can be modeled as having a transmitter and receiver. This is the power of formal abstractions, that people often fail to grasp. For instance, some people say the brain is not a computer because it doesn't have this or that physical feature that people assume computers have. But all that matters is that the brain can be modeled as doing computations, regardless of the details of how it carries that out.

And “noise” and “random” are in no way synonyms; the noise can be highly ordered – think punk rock or any other kind of music you don’t like :-), an interfering radio station, etc. We are now discussing “noise” in its colloquial meaning. I have equated (maybe incorrectly) noise with randomness, (white) noise being a pure example of randomness.

Sorry, my first example was poorly presented; think of trying to carry on a conversation while loud music is playing. The music is noise that interferes with reception of the signal (what the other person is saying). Conversely, the conversation is noise if you're trying to listen to the music. Similarly, two interfering radio signals are each noise relative to each other. Which is the signal is determined by which station is designated as the source. "White noise" is a random signal with a flat power spectral density. If, say, you were testing some equipment for its ability to detect white noise, and your tests were screwed up by interference from a radio station playing beautiful Mozart, the Mozart would be the noise.

Mike Elzinga · 6 January 2008

...the Mozart would be the noise.

Nah; Mozart turns everything else into noise. :-)

Mike Elzinga · 6 January 2008

According to Shannon-Weaver information theory, random noise maximizes information.

This is not a strange as it first appears. It is often used to settle a system into a selected state by randomly "vibrating" it so that nearby metastable states are emptied (e.g., shake something or tap on it to get it to settle into position).

PvM · 6 January 2008

According to Shannon-Weaver information theory, random noise maximizes information.

It maximizes entropy not information.

Mike Elzinga · 6 January 2008

It maximizes entropy not information.

Well, ok, it depends on the system and how a “signal” is contained within noise. In image processing, for example, an image can be buried in noise which is characterized by high spatial frequencies. Adding random noise in the form of “dithering” randomizes the phases of the higher spatial frequencies more than the lower special frequencies of the image of interest. This “washes out” the noise more than it does the image of interest and so the image of interest is thereby enhanced. It seems counter intuitive, but it works very well. It is used, for example, in radar imaging, and laser imaging. There is a very simple mathematical description of this in the Fourier transform domain of the image. It is often referred to as a "poor man's low pass filtering process". A similar example can be seen when you look through bushes at an object in the distance on the other side. If you “dither” your position (or simply move your position steadily), the higher spatial frequencies of the branches and leaves wash out enough to clarify the object on the other side. The reason the object needs to be at some distance is because you want the relative motion of the object with respect to the branches and leaves to be as high as possible. So the image on your retina consists of a relatively stable object and a blurred background of leaves and branches. I use this technique all the time as I drive past bushes obstructing vision on street corners in my neighborhood. I can see what is coming down the cross street more easily. I learned this trick from my experiences with image processing.

Popper's Ghost · 6 January 2008

Nah; Mozart turns everything else into noise. :-)

The joke was already implicit in my use of that counterintuitive example; no need to be so crass as to make it explicit.

Popper's Ghost · 6 January 2008

It maximizes entropy not information.

Argue with Adami; it's his statement.

Popper's Ghost · 6 January 2008

Hmmm ... actually, while the talkorigins page I linked to attributes the statement to (Adami et al. 2000), nothing of the sort appears in that paper.

Toni Petrina · 6 January 2008

It is also interesting to notice that Dembski equivocates between specification and function. Other error is that he calculates complexity with regards to uniform complexity and from scratch, ignoring natural contingency of biological systems.

Thus, it is trivial to show that when we discovered that HIV evolved VPU unit (thanks to Abbie Smith), HIV virus is CSI. I is complex (because Dembski ignores previous steps), it has a function so it is specified thus HIV with VPU (HIV-1?) is CSI and it is obviously designed. By definition.

Thus CSI fails because if you refuse to model according to evolutionary models (all organisms are descending one from another, so are their gene pools) you can make everything as CSI. In other words, all mutations that change function are designed mutations.

Toni Petrina · 6 January 2008

Err, instead of "with regards to uniform complexity" it should be "with regards to uniform probability". Also, "It[HIV] is complex" not I :o

PvM · 6 January 2008

Popper's Ghost:
It maximizes entropy not information.
Argue with Adami; it's his statement.

Is it? Can you show me where in his 2000 article he makes this claim? I'd say he makes exactly the opposite one... Perhaps your 'reference' was wrong? It always helps reading the original literature, it's a mistake commonly reserved for creationists though

ndt · 7 January 2008

Merlin said: Now, why should I doubt my intuition?

Because you are doing science. Disregarding intuition is the heart of the scientific method, and rightly so.

Joe Felsenstein · 7 January 2008

Many messages ago,

Merlin: Or if there were a good model of how NS could weed out all the bad mutations and keep the ones that increased information/improved function/created new structures, then you might be on to something. But along with what might be improved function/increased information there is clearly a lot of the opposite. Five thousand plus know genetic diseases? Wouldn't any increased information/improved function be swamped by the garbage?

No, it wouldn't. There is a good model of how natural selection can do that. The garbage is held to low frequencies in the population by matural selection against it. The "improved function / increased information" (i.e. beneficial) mutations are far more likely to rise to fixation in the population. The calculations are called theoretical population genetics. The field has over 100 years of development -- the particular formulas you need were published by Motoo Kimura in 1962. Use them to show us that your intuition is correct. Or if you can't, ask us to and we'll show that your inutition isn't correct.

PvM · 7 January 2008

Intuitively, (what one relies on when one is ignorant), one knows specification when he see it. (Wasn’t it Dawkins who acknowledged this when he told his students that what they were seeing is only the appearance of design?) And intuitively one know what information is, and intuitively, YOU know that you cannot make random changes to any highly complex, functioning system and expect improvement. Would you go into an automated factory, make a few random changes and go to the shipping dock to await the new products? If you do, take a book. You’ll be there a while.

Intuitively you are wrong about many of these claims. Specification is simple: It's called function in biology. ID is abusing the confusion between information as defined by ID and information as intuitively understood to further its case. Information according to ID is merely improbability and yes, random changes AND selection trivially increase information in the genome. It's this intuitive confusion which ID proponents seem to abuse to further their vacuous claims. As to information increasing under processes of variation and selection, see the work by Tom Schneider, or Adami, Lenski and others.

Or if there were a good model of how NS could weed out all the bad mutations and keep the ones that increased information/improved function/created new structures, then you might be on to something. But along with what might be improved function/increased information there is clearly a lot of the opposite. Five thousand plus know genetic diseases? Wouldn’t any increased information/improved function be swamped by the garbage?

There is a good model of how NS weeds out the bad mutations and maintains the good ones. Sure, there are some genetic diseases but they do not seem to spread, luckily us humans have a way to deal with them, it's called sex.

Shebardigan · 7 January 2008

Merlin: ... intuitively, YOU know that you cannot make random changes to any highly complex, functioning system and expect improvement.

Actually, life presents a number of counterexamples. Much (but not all) of pre-modern medicinal practice appears to be based upon making more-or-less random changes to diseased systems, in some cases valuable knowledge resulting. (Truth to tell, some of modern medicinal practice appears to be based on this approach.) And there was a guy I knew in middle school, back in the late 1950s, who developed the "tap" method of repairing vacuum-tube radio receivers: remove the chassis from the cabinet, connect a wire to some random point in the circuitry, apply power to the unit, briefly connect the wire to random points in the circuitry, and rejoice if an improvement is observed. (The few occasions when the procedure resulted in something like "connect B+ voltage to audio amplifier screen grid" provided sufficient entertainment for the rest of us that we encouraged him in this behavior. From a safe distance.)

Torbjörn Larsson, OM · 7 January 2008

Thanks all, but particularly to Mike Elzinga and PG, for enlightening examples of contingencies in evolutionary history as pertaining to Dawkins et al descriptions of information in variation and selection.

Torbjörn Larsson, OM · 7 January 2008

@ Merlin:

Dawkins did not answer the question because he could not. And when he tried, he answered a different question.

I think he made it very clear that he answered the question about a "mutation that increased information". And why do you care about Dawkins especially, as RBH gave you a few specific examples of mutations that improves function after fixation (for example by selection)?

if there were a good model of how NS could weed out all the bad mutations and keep the ones that increased information/improved function/created new structures, then you might be on to something.

Why don't you read what Dawkins said, as he describes just that, in the context of increasing and decreasing a range of evolutionary probabilities (alleles)? Or perhaps you are satisfied with that biologists indeed are "on to something". They have been that, for over 150 years. Or don't you care for the history of science either?

Wouldn’t any increased information/improved function be swamped by the garbage?

According to some recent papers, the rate of adaptive human evolution has vastly increased during the past 40,000 years and especially the last 10,000. If you read the material, you can see that the number of beneficial (and neutral) mutations have measurably increased compared to deleterious ones. So what you propose is observably not happening, quite the reverse, for at least one specie. [Btw, I believe somewhere in the related posts is also a note by Hawks that the observed rate short-term violates Haldane's limits by a factor 10 - 100, making a joke of another creationist scam claim.]

Popper's Ghost · 8 January 2008

Is it? Can you show me where in his 2000 article he makes this claim? I’d say he makes exactly the opposite one… Perhaps your ‘reference’ was wrong? It always helps reading the original literature, it’s a mistake commonly reserved for creationists though

It helps to read ahead before responding ... like, as far as the next message where I noted -- after reading the original literature -- that the statement I quoted from talkorigins seems to have misrepresented its source.

Popper's Ghost · 8 January 2008

Wouldn’t any increased information/improved function be swamped by the garbage?

No, because the garbage dies. This is what the "random processes folks just can't seem to get into their heads, or remember for more than a fraction of a second. The requirement that changes must be reproduced in order to be retained results in selection of those changes that favor reproduction. That was Darwin's critical insight.

Popper's Ghost · 8 January 2008

intuitively, YOU know that you cannot make random changes to any highly complex, functioning system and expect improvement.

Evolution isn't about changes to functioning systems, it's about producing new systems that are modified versions of functioning systems, with the modified versions that don't function being discarded and the ones that do being retained. Try applying some intuition to that. Say to yourself over and over again "descent with modification" until you get it.

PvM · 8 January 2008

Popper's Ghost:
Is it? Can you show me where in his 2000 article he makes this claim? I’d say he makes exactly the opposite one… Perhaps your ‘reference’ was wrong? It always helps reading the original literature, it’s a mistake commonly reserved for creationists though
It helps to read ahead before responding ... like, as far as the next message where I noted -- after reading the original literature -- that the statement I quoted from talkorigins seems to have misrepresented its source.

Excellent. Sorry for not seeing the subsequent posting.

Merlin · 8 January 2008

Joe Felsenstein

A quick search gave me Kimura's neutral theory of mutation. Is that the one? It seems to me that it all depends on the ratios of deleterious, neutral and beneficial mutations. I have heard estimate of one out of 100, 1 out of 1000, and fewer beneficial mutations to all others. What are the current estimates?

Of course, deleterious mutations TEND to get weeded out, and the worst are weeded out the best, but with new ones arising continually and beneficial one rare, the net effect would seem to be negative. Could this be resolved by a computer model?

There is still the problem of getting new structures/new functions by mutations.

On some creationist or ID website, I read a discussion of an article claiming that the accumulation of "neutral" mutations had a negative effect. If I see it again I'll let you know.

PvM and Popper's Ghost

"Specification is simple: it's called function in biology"

Yes, however, the thing commonly specified is a polypeptide sometimes thousands of amino acids long that has to fold in a certain way. Lots of room for error. Not much for improvement. (Sex spreads the errors) Are rates of genetic diseases decreasing? Have they been decreasing throughout the existence of humans?

Torbjorn Larsson

Beneficial and a trivial increases in information, perhaps. And you may be correct on everything else if you assume the evolution that I am questioning. We observe genetic diseases. Where are the new structures/new functions?

....

Demski, using work by Douglas Axe, describes small islands of functioning genes surrounded by a vast ocean of non-functioning possible sequences of amino acids. The implication being that you cannot get from one function to another by descent with modification. Is this a fair picture?

Mike Elzinga · 8 January 2008

Demski, using work by Douglas Axe, describes small islands of functioning genes surrounded by a vast ocean of non-functioning possible sequences of amino acids. The implication being that you cannot get from one function to another by descent with modification. Is this a fair picture?

Aside from the fact that Dembski’s work is completely bogus (and, therefore, its implications not worth considering), why can’t one get from one function to another by decent with modification? Where did you learn that? What does the fossil record show?

Popper's Ghost · 8 January 2008

Excellent. Sorry for not seeing the subsequent posting.

Possibly because of the dog awful software at this site that hides posts (I now habitually check for them after hitting preview, before posting). Any chance it will ever get fixed?

Popper's Ghost · 8 January 2008

Of course, deleterious mutations TEND to get weeded out, and the worst are weeded out the best, but with new ones arising continually and beneficial one rare, the net effect would seem to be negative.

Yeah, it seems so much more likely that populations would becomes less and less fit over time -- despite natural selection favoring fit over unfit organisms -- if it weren't for the constant insertion of space aliens' pudgy digits into the process. Mutations that destroy function don't reproduce (even if they manage to survive one generation, they will perish in subsequent ones). The remaining mutations introduce variation. If you think in terms of populations and very large numbers of generations, you might be able to get your intuitions in line with the observed facts -- which you won't find "on some creationist or ID website" or in the work of "Demski". You talk about "assume the evolution I evolution I am questioning". You might as well talk about "assuming" the Earth is round -- there's no need to assume what has been observed.

Popper's Ghost · 8 January 2008

Demski, using work by Douglas Axe, describes small islands of functioning genes surrounded by a vast ocean of non-functioning possible sequences of amino acids.

Hey, I thought ID predicts that all DNA is functional?

The implication being that you cannot get from one function to another by descent with modification.

Hunh? How in the heck is that an implication of the above? The fact is that we have some rather detailed pictures of how one function evolved from another, including ID's favorite targets, the bacterial flagellum and the human blood clotting system.

Joe Felsenstein · 8 January 2008

Merlin: A quick search gave me Kimura's neutral theory of mutation. Is that the one? It seems to me that it all depends on the ratios of deleterious, neutral and beneficial mutations. I have heard estimate of one out of 100, 1 out of 1000, and fewer beneficial mutations to all others. What are the current estimates? Of course, deleterious mutations TEND to get weeded out, and the worst are weeded out the best, but with new ones arising continually and beneficial one rare, the net effect would seem to be negative. Could this be resolved by a computer model?

No, it is not the neutral theory, but Kimura's (1962) publication of the probability of fixation of a beneficial or deleterious mutant. It will be found in this paper of his:

Kimura, M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47: 713-719.

which is made available freely here. The relevant equations are (8) and (15). Another place to find it is in Chapter VII of my online populations genetics book. Use of these formulas will show that the proposed "swamping" is not going to happen, and will show how much more common deleterious mutants have to be than advantageous ones to have a greater effect. Let us know what you discover.

Torbjörn Larsson, OM · 8 January 2008

I believe somewhere in the related posts is also a note by Hawks that the observed rate short-term violates Haldane’s limits by a factor 10 - 100, making a joke of another creationist scam claim.

Ehrm, always check before posting. [Hangs head in shame.] Hawks et al work blows Haldane's limit out of the water:

What about the Haldane limit? I thought such rapid evolution was impossible! J. B. S. Haldane famously estimated that the substitution cost of new alleles in humans limited the rate of adaptive evolution. In his estimate, the slow rate of human reproduction limited substitutions to one every 300 generations. This became known as the "Haldane limit". Motoo Kimura used Haldane's argument as a reason why selection could not explain the substitution rate, and he asserted that as support for his neutral theory of molecular evolution. We do not challenge neutralism, but it is clear that Haldane's limit is a problem, since every estimate says that humans have been evolving at many times that rate. Maynard Smith (1968, Nature) showed that Haldane's argument depended on the unrealistic assumption of independence among all selected loci, so that the substitution load depends critically on the fitness of the optimal genotype among all selected loci. If selection on many loci is non-independent, then a very large number of genes may be selected with the same substitution load as a single gene under Haldane's assumptions. Later, Ewens (e.g., 1972, Am Naturalist) made a similar argument. Ewens (2004, Mathematical Population Genetics) reviewed this problem, pointing out an additional weakness of Haldane's argument: it depends solely on mortality selection, while many genes may be under fertility selection. These considerations show that Haldane's limit does not constrain the adaptive substitution rate in humans to 1/300 generations, and our estimated rate of 13 per generation is not excluded. Moreover, considering the high infant and juvenile mortality evidenced in Neolithic and later populations, much of that death rate resulting directly from disease and dietary deficiencies, the number of selective deaths available to drive substitutions has clearly been high.

So not only is the Haldane limit observably surpassed by a factor of > ~ 1 000, it can be a persistent violation in their model. Another IDC claim eats the dust for good.

Torbjörn Larsson, OM · 8 January 2008

@ Merlin:

And you may be correct on everything else if you assume the evolution that I am questioning.

I don't think so. I'm not a biologist but Hawks say they test for linkage disequilibrium with an LDD test:

In population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. ...
Linkage disequilibrium is generally caused by genetic linkage and the rate of recombination; mutation rate; random drift or non-random mating; and population structure.

Which of these observable effects are dependent on an assumption of evolution theory in your eyes? I don't think any are [again, not a biologist], and as they do a test they get an independent test of evolution theory, that hereditary changes happens in populations over longer periods of time. But even if they didn't, evolution theory can't be assumed by biologists, it is a basic theory long since validated by numerous other tests.

Where are the new structures/new functions? ….

You can answer that question yourself by reading the paper for the data on observed change.

Joe Felsenstein · 9 January 2008

Torbjörn Larsson, OM:
I believe somewhere in the related posts is also a note by Hawks that the observed rate short-term violates Haldane’s limits by a factor 10 - 100, making a joke of another creationist scam claim.
Ehrm, always check before posting. [Hangs head in shame.] Hawks et al work blows Haldane's limit out of the water:
What about the Haldane limit? I thought such rapid evolution was impossible! J. B. S. Haldane famously estimated that the substitution cost of new alleles in humans limited the rate of adaptive evolution. In his estimate, the slow rate of human reproduction limited substitutions to one every 300 generations. This became known as the "Haldane limit". Motoo Kimura used Haldane's argument as a reason why selection could not explain the substitution rate, and he asserted that as support for his neutral theory of molecular evolution. We do not challenge neutralism, but it is clear that Haldane's limit is a problem, since every estimate says that humans have been evolving at many times that rate.

(Larsson's second block of quotation was not from himself but from John Hawks's web site, which he had referenced). I wrote a paper in 1971 (in American Naturalist) in the debate about the meaning of Haldane's limit, a debate Kimura had set off by using it as an argument that most substitutions had to be neutral. In my paper I pointed out that Haldane's model was one in which the environment deteriorates, and then the population is rescued by a substitution, after a time. I redid Haldane's argument (more exactly). The Haldane limit is set by the reproductive excess of the population, how many offspring there are above 1 per parent. If the reproductive excess is not sufficient, the population will be driven extinct by the deteriorations of the environment. However, if some substitutions are not deleterious, but advantageous, they do not pose any threat to the continued existence of the population. In fact, they increase the ability of the population to bear the cost of the deteriorations of the environment. That is another reason why Haldane's limit is not a problem. People accept the Haldane calculation too uncritically without thinking of this issue. Scenarios involving gene interaction are not necessary to escape the Haldane "limit".

Merlin · 10 January 2008

Torbjorn Larsson and Joe Felsenstein

I will try to get thru the materials you have referenced. Time and the math may be a problem. Thanks. I can see that I am not well enough informed to discuss this with you. None the less, the problem is getting new information/structures/function, by whatever definition. I understand that one can design a good radio antenna using a evolutionary algorithm, but one will never happen upon a bicycle wheel using the same algorithm.

Mike Elzinga

The fossil record shows little change, once a species shows up. Isn't this why the theory of punctuated equilibrium was proposed; to explain the stasis?

Popper's Ghost

Dembski, of course. Thanks, my mistake, and I should have said amino acids when I said genes. It is in No Free Lunch. For a functioning enzyme that has 1000 amino acids, there are 20 to the 1000th possible sequences of that same length. Sequences that vary slightly from the functioning one, will perform the same function, but as more changes are made, function drops off and ceases but the number of the remaining possible sequences is so huge that it is impossible to get to another functioning sequence by chance.

GuyeFaux · 10 January 2008

I understand that one can design a good radio antenna using a evolutionary algorithm, but one will never happen upon a bicycle wheel using the same algorithm.

This would be a consequence of GAs "lacking a target"; so you couldn't expect a GA to arrive at a specific solution, like a bicycle wheel. But I don't think that's what you mean: I think you mean that for some reason GAs can't produce something as Platonic as a perfect circle. Can you clarify why not? I can imagine evolutionary algorithms that would be able to produce something close to a bicycle wheel.

Stanton · 10 January 2008

Merlin: Dembski, of course. Thanks, my mistake, and I should have said amino acids when I said genes. It is in No Free Lunch. For a functioning enzyme that has 1000 amino acids, there are 20 to the 1000th possible sequences of that same length. Sequences that vary slightly from the functioning one, will perform the same function, but as more changes are made, function drops off and ceases but the number of the remaining possible sequences is so huge that it is impossible to get to another functioning sequence by chance.

This is utter nonsense. Dembski knows absolutely nothing about genetics. He conveniently neglects to factor in things like silent or quiet mutations, where an altered codon still codes for the same amino acid, or where a changed amino acid does not change the function of the protein. Silent mutations occur all the time in nature. Dembski also blatantly ignores the fact that advantageous mutations have been observed, both in nature and in the laboratory. http://en.wikipedia.org/wiki/Silent_mutation

Glen Davidson · 10 January 2008

I understand that one can design a good radio antenna using a evolutionary algorithm, but one will never happen upon a bicycle wheel using the same algorithm.

Yes, very good. Evolution, or its analogs, the GAs, won't make a wheel, or at least it's quite unlikely (intermediates are difficult to imagine, and there are severe physiological constraints). Design, by contrast, is likely to produce a wheel. That's why wheels have been designed by humans, and are not found in biology. Because biology has not been designed (except for a few tweaks by ourselves). Glen D http://tinyurl.com/2kxyc7

Joe Felsenstein · 10 January 2008

Merlin: Torbjorn Larsson and Joe Felsenstein I will try to get thru the materials you have referenced. Time and the math may be a problem. Thanks. I can see that I am not well enough informed to discuss this with you. None the less, the problem is getting new information/structures/function, by whatever definition. I understand that one can design a good radio antenna using a evolutionary algorithm, but one will never happen upon a bicycle wheel using the same algorithm.

The problem that you raised was whether the large number of deleterious mutations would "swamp" lower rate of advantageous mutations and cause fitness to decline. I predict that when you do the calculations you will find that this won't happen. The issue of whether the favorable mutations will be truly novel is a completely separate one, and not what I was responding to. I await with interest your use of the equations I cited, and the conclusions from them about whether swamping will occur. If you can't do it, let us know and I will do an example and post it here.

Mike Elzinga · 10 January 2008

The fossil record shows little change, once a species shows up. Isn’t this why the theory of punctuated equilibrium was proposed; to explain the stasis?

There can be a number of reasons for “stasis”, including an imperfect fossil record. But, in general, stasis has its analogs in all kinds of physical systems, so it would not be surprising to find it in complex living systems. Eigenmodes of energy or vibration states, in systems with several degrees of freedom, exhibit relatively stable configurations that can jump suddenly from one state to another as a result of a relatively small perturbation. How these flips occur depends on the kind of couplings that exist between different states and the nature of the perturbations. In very complicated systems, the couplings among states may be extremely small (this is necessary in order for a system to be complicated; otherwise it would be chaotic and unstable and, in the case of living systems, perhaps unable to survive; or alternatively, very adaptable). So a relatively large perturbation would be required to flip it from one state to another. In the case of a living system, periods of stasis would simply mean that the environmental perturbations aren’t sufficiently large to cause a detectable change in the phenotype that gets fossilized. On the other hand, if flipping to another state requires a perturbation that is more than the system can handle, the system is destroyed (goes extinct). So complicated systems that don’t have a sufficiently close distribution of states into which they can flip are more vulnerable to extinction if the environment in which they are immersed makes a large enough change. So there is nothing esoteric going on here.