Database Resources for Genomics Researchers

Posted 18 April 2005 by Matt Brauer

↗ The current version of this post is on the live site: https://pandasthumb.org/archives/2005/04/database-resour.html

Everyone with a serious interest in biology is aware of PubMed and Genbank, the major literature and sequence databases at NIH. But there are a large number of more specialized and professionally curated databases that allow the researcher unparalleled glimpses into an organism’s genetics, molecular biology, physiology and evolution. Over the next few articles, I’d like to highlight some of these databases, showing how they are used and how critical they are for an understanding of the organism. These databases often have deep and highly technical content, but they also have much that is accessible to the non-specialist. And in genomics there are always more questions being asked than there are people to supply answers. As with planetary image analysis and observational astronomy, there may be some areas of opportunity for talented amateurs (especially those with a computational background) who wish to make a contribution to the field.

The first database I’ll be highlighting is the one I use most — several times a day, at least. In fact, I use it so much that the Saccharomyces Genome Database is my browser homepage.

Saccharomyces cerevisiae is the common baker’s or brewer’s yeast. It is a budding yeast, which means that it reproduces by extruding “daughter cells” rather than by undergoing fission and forming two equivalent offspring cells. Yeast are eukaryotic microbes: although they are free living single-celled organisms, they have all of the machinery, and most of the genes, that other eukaryotes — including humans — have. S. cerevisiae is one out of the approximately 1.5 million extant species of fungi. It is a highly derived fungus, though, having lost its ability to form hyphae (the long filaments in which many fungi grow). It is a prototroph meaning that, like other fungi, it needs to consume existing organic material, rather than photosynthesizing it.

Baker’s yeast is a valuable model organism for several reasons. Although highly derived, it is nevertheless the prototypical eukaryote, unencumbered by the requirements of multicellularity. Its genetics are lovely and elegant: one mode of reproduction is via an ascus, or membrane bound sac that contains the four spores that result from meiosis. This means that all of the products of a single meiosis are kept together, and the researcher can keep track of crossover events with extreme ease. Most genetic traits can be mapped to arbitrary levels of precision, by examining larger numbers of spores.

Yeast have roughly 6500 genes (about 5-fold fewer than do humans), many of which have clearly identifiable homologs in multicellular plants and animals. The genome has been completely sequenced, and functions have been verified for about 65% of these (see Genome Snapshot). There are well-developed microarray platforms for analyzing the gene expression, chromosome organization and complement of chromosome binding proteins in Saccharomyces cultures (see the Stanford or Princeton microarray databases.) Every gene (more or less) in Saccharomyces has been systematically deleted to result in libraries of knock-out mutants. Other strain collections that take advantage of the tractability of yeast are constantly in development.

If you are interested in cellular physiology, molecular biology or microbial population biology, it’s hard to see how there can be a better model system.

SGD is the central clearinghouse for all of this information. With the database you can explore the genetic and chromosomal locations and properties of a gene, the conditions under which it is expressed, the properties, interactions and cellular location of its product, and the relevant recent literature from which our knowledge of the gene is derived.

As an example, look at the gene known as Leu3. Its systematic name is YLR451W, meaning it’s on the right (“R”) arm of chromosome 12 (“L”), the 451st gene from the centromere, transcribed in the plus direction (from the “Watson” strand, “W”). Leu3 is a regulator of the branched chain amino acid synthesis pathway. It’s a transcription factor, localized primarily to the nucleus. Systematically deleting this gene results in a viable strain, although it is sensitive to certain conditions and is somewhat auxotrophic for leucine. From this page, you can learn what domains the Leu3 protein has, what other proteins it interacts with, and what other genes it regulates. If you follow the link to “Expression Connection Summary” under “Functional Analysis” you can see what other genes are co-expressed with Leu3 (not many, in this case) in the expression studies done to date.

In short, there is several hours worth of exploration possible for this single gene, without even entering the primary literature.

SGD is a valuable tool for the evolutionary biologist as well. The front page includes links to alignment tools for comparing Saccharomyces gene and protein sequences with those of other sequenced fungi. There is also a “synteny viewer” that displays gene organization and orientation among a group of diverged yeast species. It is from this data, in part, that the conclusion has been made that the S.cerevisiae genome is the product of a genome-wide duplication, followed by divergence and specialization of individual paralogs (about which I may have more to say in a later post).

The depth of knowledge about this beautiful and useful organism is profound, yet I am daily made aware of how much more there is still to learn. Tools like SGD allow us to get beyond vague abstractions about biology, and to delve into the details of the cell’s processes. It is the appreciation and understanding of this detail that leads to real progress in science.

—-

In future posts I plan to look in more detail at the microarray databases, some of the metabolic pathway databases, and at the “Tree of Life” web. I also hope to go into more detail about the genome duplication story and the role genome rearrangement plays in evolution. I welcome suggestions for other topics as well.

13 Comments

RPM · 18 April 2005

Yeast geneticists have such boring names for their genes (ie, YLR451W). I suggest a post on FlyBase (the Drosophila equilivant of SGD), along with some of the more interesting gene names that Drosophila geneticists come up with.

fwiffo · 18 April 2005

So if I wanted to poke around in rat genes, or zebra fish genes, or whatever, are there database out there for that too? It would be super-nifty-neato to know the sequence for the mutation that makes dumbo rats so adorable.

RBH · 18 April 2005

And there's the synteny viewer. While their data are not yet available via ftp, it's promised in the not-too-far distant future.

RBH

Randall Wald · 18 April 2005

Actually, I recently found a page discussing the origins of some of the weirder Drosophila gene names:

Clever gene names

RPM · 18 April 2005

So if I wanted to poke around in rat genes, or zebra fish genes, or whatever, are there database out there for that too? It would be super-nifty-neato to know the sequence for the mutation that makes dumbo rats so adorable.
— fwiffo

NCBI genome database: http://www.ncbi.nlm.nih.gov/Genomes/ UCSC genome grawser: http://www.genome.ucsc.edu/ Drosophila database: http://www.flybase.org/ C. elegans database: http://www.wormbase.org/ Rat genome database: http://rgd.mcw.edu/ Ratmap: http://ratmap.gen.gu.se/ Zebrafish Information Network: http://zfin.org/ Google a species and "genome database" and you'll get whole bunch of internet resources.

Douglas Theobald · 18 April 2005

Yeast geneticists have such boring names for their genes (ie, YLR451W). I suggest a post on FlyBase (the Drosophila equilivant of SGD), along with some of the more interesting gene names that Drosophila geneticists come up with.
— RPM

Like 'cheapdate', a gene that, when compromised, makes flies more susceptible to alcohol intoxication.

Mike Hopkins · 18 April 2005

Matt,

Your link to the database is broken. You forget the "http://" in the link.

--
Anti-spam: Replace "user" with "harlequin2"

Great White Wonder · 18 April 2005

Like 'cheapdate', a gene that, when compromised, makes flies more susceptible to alcohol intoxication

Is that gene present in all flies, or just Spanish flies?

G-Do · 18 April 2005

RPM pretty much nailed the heavy hitters in genomics, right there. I can't give enough love to UCSC GenomeBrowser. You may also be interested in GALA, a sequence annotation database, the Gene Ontology, a hierarchical vocabulary of molecular functions, biological processes, and cellular components, and EMBOSS, and open-source toolkit for doing sequence/genome analysis, if you're a coder like me.

There are scads of other tools, too, but to use some of them you need to be affiliated with an institute of higher learning or pay cash :P

RPM · 19 April 2005

One problem with UCSC and GALA are that they are extremely focussed on vertebrate (and often times just mammalian) genomes. I know the folks behind GALA, and they aren't really concerned with genomics outside of vertebrates. NCBI is really the only group out there that gives every taxon is equal time. The vertebrate and Drosophila genome projects really complement each other well, and it would be fantastic if there was more interaction between the two groups.

By the way, Apollo is the program of choice in the Drosophila community for genome annotation and analysis.

Tara Smith · 19 April 2005

Don't forget microbes!

http://www.tigr.org

Greg · 19 April 2005

If I ever knew that about fly gene names, I had forgotten it. I was so glad to read those! I have worked in technical writing, mostly healthcare, for a long time now, and the jargon makes me just sick, because it never says anything, never gives a clue about what it's hoping to signify. The fly genes are actually helpful. Just like a lot of computer terms are actually helpful--mouse, click, hack. They signify something. Which leads me to think that it's not the supposed "nerds" who have a problem with jargon usually; it's business types. Business types who know that what they do doesn't actually matter, so they give things fancy-sounding names. Or maybe I'm just bitter over our recent acquisition by a corporate juggernaut.

G-Do · 19 April 2005

It's true, I'm biased toward mouse and man.