J.B.S. Haldane, when asked "What has the study of biology taught you about the Creator, Dr. Haldane?", replied "I'm not sure, but He seems to be inordinately fond of beetles."Discovery Institute Fellow Dr. Paul Nelson is inordinately fond of ORFans, genes unique to one species that appear to have no relatives in other species. He feels that these unique genes represent a significant challenge to evolutionary biology. However, he has not noticed that the distribution of ORFans implies that the designer is more enamoured of viruses than humans. A very tiny mystery:
What are ORFans, and why should we care about them? ORFans take their names from Open Reading Frames (ORF's ). ORF’s are stretches of DNA that apparently code for proteins. ORFans are ORF’s that appear to occur only in one species. Note that I say “appear to”. The computer programs that are used to identify genes during whole genome assembly can falsely identify segments of DNA as ORF’s, this can be a significant issue in some genomes. Also, our computer programs for identifying related genes can miss genes that have undergone rapid evolution. An example is the rotating image below left. This is a 3D model of the protein Xc5848 from Xanthomonas Campestris (it is also the static molecule above), originally designated as an ORFan, it was identified to be part a of large class of proteins by sophisticated structure analysis. The model is coloured by amino acid conservation, with red being the highest conservation, and blue being poorly conserved. The model is mostly red (ie it's part of a highly conserved protein family, not an ORFan at all). ORFans come in two classes, short (often less than 100 amino acids long), which are unlikely to represent real genes as they are usually much shorter than most real genes, and long (usually over 150 amino acids long), which are likely to be real genes. There are far more short ORFans than long ORFans. Paul Nelson thinks that ORFans represent a major blow to evolutionary theory. To him they break attempts to determine phylogenies and throw doubt on the idea that all organisms descended from a common ancestor. I’ve dealt with that aspect before (see also here), and I won’t go into detail here, but I would like to simply re-iterate a few points. We have a number of explanations, based on evidence, for the existence of ORFans.
1) Some represent artifacts
2) Some represent rapidly evolving genes whose origin is obscured by the pace of evolution
3) Some represent genes horizontally transferred genes from organisms that have not been sequenced yet.
4) Some represent genuine, de novo genes.
Now, as I said, we have evidence for all of these explanations, and ORFans will represent a combination of all factors. For example, it has been estimated that about half of all short ORFans represent artifacts, but some do represent genuine protein coding genes. In the Firmicutes
In 2003, a fair percentage of the ORF’s found in fully sequenced prokaryotic genomes were ORFans. However, even back in 2003, it was apparent that as we sequenced more genomes we found more relatives for ORFans and fewer new ORFans.
We have a lot more data now, and the extent of the fall in ORFans can be found by looking at the ORFan mine, a database of ORFans. As we add more genomes, we identify more relatives of things we thought were unique, and identify and purge more artifacts. Consider the Escherichia coli genome. In 2003 the total ORFans (things likely to be artifacts) in the E. coli genome constituted 5.5% of the genome, and long ORFans (things likely to be genes) represented 2.4% of the genome. By 2008, total ORFans and long ORFans represented 0.4% and 0.1% of the genome respectively. Consider also the Heliobacter pylori genome, going from 17% and 9% total and long ORFans in 2003 to 2.3% and 0.6% total and long ORFans in 2008. If you look at all 60 of the genomes reported by Seiw and Fischer in 2003, the total ORFans averaged 14%, by 2008 this was down to 6%. If you look at the genomes added after those 60 (ie all the late comers, not those that are already characterised), their ORFan precent is 7%. In 2003, the last 10 organisms to be added to the databased had an average of 12% ORFans when first sequenced, in 2008, the last 10 organisms had 6% ORFans when first sequenced. Even those figures may overestimate the number of ORFans, of the 19 ORFans in the E. coli data base, 10 are annotated to viral or conserved proteins. Of the ones I’ve investigated, there is significant sequence similarity to other proteins (eg the alleged ORFan NC_000913orf2361 is annotated to be a CPZ-55 prophage, and forms a high significance phylogeny with other proteins and even has a PFAM domain in it!) Some ORFans are not. The supposed ORFan NC_000913orf2361 is related to a whole range of conserved proteins. So as we sequence new genomes, we are finding fewer and fewer ORFans. This entirely consistent with the position that ORFans represent rapidly evolving proteins, horizontally transferred proteins and annotation artifacts rather than unique proteins inserted by an unknown designer by unknown mechanisms. Paul Nelson like to emphasize the number of ORFans, as this is increasing. However, the pattern of increase is very instructive. Below I’ve plotted the total number of ORF’s and ORFan’s with increasing number of full genomes sequenced, and the fold increase of ORF’s and ORFans with respect to the numbers of ORF’s and ORFans when only 15 Genomes were sequenced (why do ID advoctes never do this type of thing?). You can clearly see that the rate of increase in ORFan number is dramatically slowing. When we reached 60 sequenced genomes, this resulted in 4.5 fold increase of ORF’s over the numbers present at 15 genomes, but just over a doubling of ORFans, by the time we got to 330 genomes, ORF’s had increased 25 fold from the numbers at 15 genomes, but ORFans had increased less than eight fold. This is entirely consistent with the fact that as we add genomes, we find more relatives of these genes. ORFan numbers increase as we sequence more genomes, but ORF’s (real genes with known relatives) increase much, much faster. This is consistent with the majority of ORFans representing under sampling of phylogenies. Data taken from Seiw and Fischer, 2003 and the ORFan mine). enter the virus:
Paul Nelson is now particularly taken with a paper from Fischer’s group, that showed that around 38% of complete virus genomes are ORFans. This figure seems to impress Paul. However, the same issues that applied to prokaryotic genomes apply to viral genomes. As shown in figure 4 of Yin and Fischer (above), as the number of viral genomes sequenced increases, the percentage of ORFans drops as relatives are found (just like prokaryotic ORFans). The phage groups with the most “ORFans” are those that have the fewest sequences (just like prokaryotes, which suggest that sampling of genomes is the main issue). Furthermore, 18% of alleged “ORFans” turn out to be horizontally transferred prokaryotic genes (just as a fair proportion of prokaryotic “ORFans” turn out to be horizontally transferred bacteriophage genes). Looking at the authors conclusions we find them saying:
Because the current sampling of phages (and of bacterial genomes in general), is limited and biased towards particular groups, the percentage of ORFans in different phage groups varies significantly. This low sampling may be a factor contributing to the abundance of phage ORFans, but is not likely to be the only one. That is, even after many more genomes are sequenced, we expect to find a significant number of ORFans and near-ORFans, awaiting interpretation. There are also other possibilities to account for the ORFans’ origin, like rapid divergence after horizontal transfer (from hosts or from other viruses, from existent genomes or yet extinct genomes) or duplication.Rapid divergence obscuring ancestry in rapidly evolving viruses is by no means unusual, and more careful sequence comparison will undoubtedly turn up more relatives (just as happened with procaryotes). Summary: So, the solutions to the ORFan “puzzle”, as outlined by Yin and Fischer (poor sampling, horizontal transfer, rapid evolution) follow the same lines as my previous Pandas Thumb posts (I also included annotation errors, known to produce a proportion of alleged prokaryotic “ORFans”. These annotation errors are likely to be substantial in small genomes as well). It is instructive to compare the number of ORFans in various genomes (as they currently stand). The Human genome has 0% ORFans [see note], Prokaryotes an average of around 7% and viruses around 30%. Now, if it may be that ORFans represent artifacts, poor sampling and rapidly evolving genes (which would explain why rapidly evolving, under sampled and exceedingly diverse groups like viruses have more ORFans than prokaryotes or Humans). Or the Designer really has an inordinate fondness for viruses.
Note: Paul Nelson objects to the paper that eliminated the last of the ORFans from the human genome (Clamp et al., 2007), as he claims that they did this on purely evolutionary reasoning. He is wrong; they also looked at whether these sequences were significantly different to random sequences, and whether they were expressed as protein. They weren’t and they aren’t. This is good evidence that they are artifacts. Larry Moran has a good discussion of ORFans at the Sandwalk.
38 Comments
PvM · 14 May 2008
Nice.
David Stanton · 14 May 2008
Why does every creo who manages to read a paper conclude that it is something that is unexplainable by modern evolutionary theory? And why do they never seem to notice that the real authors never have this problem? If the papers ever proved what is claimed the authors would have pointed it out and become famous. Wonder why they didn't bother?
On another thread some yahoo was trying to claim that genes that are "closely related" in different organisms somehow disprove common descent. Now some guy is trying to claim that if genes are not similar to other genes it disproves common descent. Make up your minds guys. Learn what the modern theory of evoution really predicts and stop making up fake scenarios that evolution supposedly cannot explain. And stop ignoring basic genetic mechanisms that are well understood by real scientists, that just makes you look ignorant.
hje · 14 May 2008
Wolfhound · 14 May 2008
hje for the win!
James F · 14 May 2008
hje FTW indeed. Don't forget paranoid delusion (about Big Science, the Global Darwinist Conspiracy™, etc.).
You could set the denialism to a music video. Oh wait, someone has.
Henry J · 14 May 2008
Charley · 14 May 2008
One minor point: the Firmicutes are Gram positive. E. coli is Gram negative and in the phyla Proteobacteria
Zaius · 14 May 2008
Science Avenger · 15 May 2008
Add: an exagerrated confidence in their ability to discern the truth of a complex scientific argument merely by thinking about it a lot.
In other words, truthiness.
steve s · 15 May 2008
Has Paul Nelson ever made a respectable argument? Ever? I can understand why his 'ontogenetic depth' monograph is indefinitely postponed. I bet it sucks out loud.
DaveH · 15 May 2008
Frank J · 15 May 2008
When the subject is "junk DNA," IDers always react with: "we don't really know that it's 'junk' because they keep finding functions all the time."
When the subject is "ORFAns" can we expect them to react consistently with: "we don't really know that they're ORFans, because they keep finding relatives all the time."?
If they have a double standard, it would be especially curious since ID itself has never ruled out common descent and some IDers have conceded it outright.
Nigel D · 15 May 2008
Thanks, Ian, for a very thorough (and thoroughly referenced) summary of the state our knowledge of prokaryotic and phage ORFans.
I do have a nitpick - your use of apostrophes in the plural of ORF ("ORF's") is superfluous, and implies a contraction or a possessive, neither of which seems to be the case from the context. "ORFs" does the job of indicating the plural quite nicely.
Frank J · 15 May 2008
Ron Okimoto · 15 May 2008
The designer loves prokaryotes. Obviously, metazoans were only designed as condominiums for the little beasties. When the designer comes back and finds one up start species using antibiotics and doing other terrible things like bathing and chlorinating the water there will be hell to pay.
William E Emba · 15 May 2008
wamba · 15 May 2008
Stacy S. · 15 May 2008
Darn it! I am still trying to get through my biology book so I can understand this stuff. Aauurrgghh!
@ James - another great video :-)
AnswersInGenitals · 15 May 2008
Every issue in the life sciences represents "a significant challenge to evolutionary biology". It is evolutionary biology's exquisite ability to address these challenges, explain the physical evidence, and provide the historic underpinnings of all these issues that has led to the almost universal acceptance (amongst those who actually study biology and understand the evidence) of Darwinian evolutionary theory. The historic progress you demonstrate concerning ORFans is an excellent example of this continuing process.
David Stanton · 15 May 2008
AnswersinGenitals,
Well said. Love the handle.
slang · 15 May 2008
steve s · 16 May 2008
steve s · 16 May 2008
I would go ahead and start a thread for it at After the Bar Closes, but that thread would just languish, as Paul flies around the world, talking his way into free meals and trips, dispatching horrible old arguments against evolution, and never delivering the goods.
steve s · 16 May 2008
substitute "deploying" for "dispatching" in the above.
slang · 16 May 2008
Well, steve, I'm not aiming for humiliation, although I'll admit that it's a little bit of a cheap shot. Paul Nelson (or someone using his name) always seemed nice and polite when he used to post here, and it's not like I have anything to show in science myself. Nonetheless, the promise of publication still stands, AFAIK.
It would just be nice, for once, to see one of them saying something like "alright guys, I really thought I was on to something significant here, but I've read the response, thought about it some more, and now see that I was wrong". Oh well, I can dream, can't I? :)
Frank J · 16 May 2008
Philip Bruce Heywood · 16 May 2008
Maybe they're waiting for the technology to arrive to enable a level-headed and detached person to make a decision, rather than rushing in where angels fear to tread.
That's excluding the tiny minority that has a sectarian, slavish view of Scripture interpretation, which minority only has credibility now because people who will not learn from history and will not employ systematic, disciplined methodology, keep rushing in.
Jim Wynne · 16 May 2008
Frank J · 16 May 2008
It took me several reads, but PBH, in the 6:06 comment, seems to be saying:
"Pay no attention to those who claim that scripture overrules evidence, but rather pay attention to people like Behe who admit that (1) doing so is silly, but also (2) don't rush in to any explanation that could actually be testable and useful, when you could always take the loophole that 'we're still waiting for technology to catch up to our bag of intuitions,' until which time (which is never) the official answer is 'don't ask, don't tell'."
BTW, the "sectarian, slavish view of Scripture interpretation" may be a tiny minority among scientists, but it is about half of the general public. And anti-evolution activists miss few opportunities to exploit that difference.
Art · 17 May 2008
Paul, it seems to me as if your point regarding ORFans hinges on some supposed impossibility of new proteins arising via mechanisms commonly at work in the cell. (If this is not so, then perhaps you could stop by and correct me.) This essay shows one way in which you are wrong. Scientists working in other systems are catching up with earlier generations of plant scientists in this regard.
As I see things, the matters of the reliability of the estimates of ORFan numbers notwithstanding, your argument is based on a fundamentally erroneous assumption.
(I know, I know, you need some sort of subscription to see the second item. The title of the cited article - "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae", by Cai et al., says enough for now. The article is slated to be in print in the May 1 issue of Genetics.)
Stuart Weinstein · 17 May 2008
Mr. H.E.Thoma · 18 May 2008
Well!! thank, the so called creator (OR DESIGNER) that I am not an ORfan if you will,and that I have a mother and father and one hundred and seven living relatives, but you could not tell by looking at em.Maybe they are the orf's or ORfans,and the orf's are taking vitamins, and the orfans are
doing drugs, Adios orfans.The good book says from dust you were made and unto dust you will return, so why worry about every thing in between?,
which brings me to this thought, if there is a designer (GOD) why would
he cause all this confusion?, when his main purpose is to make us belive he exists and love him. Why would he not leave more simple foot prints for you to find?, for he surely would know that man would try and prove
it.So dedicate yourself and your reaearch to the betterment of life and
forget about design.
respectfully yours
Mr H.E. Thoma
Tyrannosaurus · 19 May 2008
Ontogenetic depth... the very name causes giggles. I have been waiting for years now this momunental monography so I can laugh hard and long. Paul please do not deny all of us here from a good laugh. Publish the ontogeneti depth monography ASAP.... :-)
Nigel D · 21 May 2008
peter borger, biologist, PhD · 26 May 2008
Ian,
What you missed in the literature (and what I did not miss because I am in the medical field) is that there are no real orphan genes. Orphan genes can be often found in RNA viruses and ERVs. Why are they in ERVs? Because an ERV is not an integrated virus. Because ERVS are not remnant of ancient viruses but rather they are (derailed) VIGEs (variation inducing genetic elements). VIGEs were designed to rapidly generate variation. The simplest virus is an three element VIGE plus an oncogene. It wasn't an oncovirus that integrated in the genome. Of course not. How could an oncovirus evolve without the genome it requires for replication? Darwinians have largely ignored the RNA virus paradox (i.e. all RNA visuses have a common ancestor around 40-50 thousan years ago) because it cannot be solved within the required time frame. Due to their paradigmD arwinians, have failed to recognize VIGEs as important genetic elements to induce variation from the inside. Junk DNA is the other fatal flaw of the darwinian paradigm. Variation inducing genetic elements, my dear, that is what makes up the junk.
Henry J · 27 May 2008
Funny, I thought "junk" DNA was a surprise to those who found it, rather than a prediction of the then current theory?
Henry
Torbjörn Larsson, OM · 27 May 2008
Torbjörn Larsson, OM · 27 May 2008