<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2010-11-5-206</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Review</dochead>
      <bibl>
         <title>
            <p>Between a chicken and a grape: estimating the number of human genes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Pertea</snm>
               <fnm>Mihaela</fnm>
               <insr iid="I1"/>
            </au>
            <au ca="yes" id="A2">
               <snm>Salzberg</snm>
               <mi>L</mi>
               <fnm>Steven</fnm>
               <insr iid="I1"/>
               <email>salzberg@umd.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>5</issue>
         <fpage>206</fpage>
         <url>http://genomebiology.com/2010/11/5/206</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">20441615</pubid><pubid idtype="doi">10.1186/gb-2010-11-5-206</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>5</day>
               <month>5</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>BioMed Central Ltd</collab>
      </cpyrt>
      <shorttitle>
         <p>Between a chicken and a grape: estimating the number of human genes</p>
      </shorttitle>
      <shortabs>
         <p>The number of genes in the human genome is still an estimate.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Many people expected the question 'How many genes in the human genome?' to be resolved with the publication of the genome sequence in 2001, but estimates continue to fluctuate.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="10thanniversary" subtype="theme_series_title" type="BMC">10th Anniversary Collection</classification>
         <classification id="10thanniversary" subtype="theme_series_editor" type="BMC"/>
         <classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
         <classification id="30010009" subtype="man_spc_id" type="BMC">Genetics</classification>
         <classification id="30010010" subtype="man_spc_id" type="BMC">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p/>
         </st>
         <p>Ever since the discovery of the genetic code, scientists have been trying to catalog all the genes in the human genome. Over the years, the best estimate of the number of human genes has grown steadily smaller, but we still do not have an accurate count. Here we review the history of efforts to establish the human gene count and present the current best estimates.</p>
         <p>The first attempt to estimate the number of genes in the human genome appeared more than 45 years ago, while the genetic code was still being deciphered. Friedrich Vogel published his 'preliminary estimate' in 1964 <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, based on the number of amino acids in the alpha- and beta-chains of hemoglobin (141 and 146, respectively). Knowing that three nucleotides corresponded to each amino acid, he extrapolated to compute the molecular weight of the DNA comprising these genes. He then made several assumptions in order to produce his estimate: that these proteins were typical in size (they are actually smaller than average); that nucleotide sequences were uninterrupted on the chromosomes (introns were discovered more than 10 years later <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>); and that the entire genome was protein coding. All these assumptions were reasonable at the time, but later discoveries would reveal that none of them was correct. Vogel then used the molecular weight of the human haploid chromosomes to correctly calculate the genome size as 3 &#215; 10<sup>9 </sup>nucleotides, and dividing that by the size of a 'typical' gene, came up with an estimate of 6.7 million genes.</p>
         <p>Even at the time, Vogel found this number 'disturbingly high', but no one suspected in 1964 that most human genes were interrupted by multiple introns, nor did anyone know that vast regions of the human genome would turn out to contain seemingly meaningless repetitive sequences. Since Vogel's initial attempt, many scientists have tried to estimate the number of genes in the human genome, using increasingly sophisticated molecular tools. Over the years, the number has gradually come down, in a process that has been humbling at times, as we realized that many other species - even plants - are predicted to have more genes than we do (Figure <figr fid="F1">1</figr>). An estimate of 100,000 genes appeared in the 1990 joint National Institutes of Health (NIH)/Department of Energy (DOE) report on the Human Genome Project <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>; this was apparently based on a very rough (and incorrect) calculation that typical human genes are 30,000 bases long, and that genes cover the entire 3-gigabase genome.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Gene counts in a variety of species</p>
            </caption>
            <text>
               <p><b>Gene counts in a variety of species</b>. Viruses, the simplest living entities, have only a handful of genes but are exquisitely well adapted to their environments. Bacteria such as <it>Escherichia coli </it>have a few thousand genes, and multicellular plants and animals have two to ten times more. Beyond these simple divisions, the number of genes in a species bears little relation to its size or to intuitive measures of complexity. The chicken and grape gene counts shown here are based on draft genomes <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp> and may be revised substantially in the future.</p>
            </text>
            <graphic file="gb-2010-11-5-206-1"/>
         </fig>
         <p>Many people, including many geneticists, expected that we would have a definitive gene count when the human genome was finally completed, and indeed one of the main surprises upon the initial publication of the human genome in February 2001 <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> was that the number had again dropped, quite precipitously. However, as we shall see, the publication of the human genome did not come anywhere close to producing a precise gene list or even a gene count, and in the years since the number has continued to fluctuate. As a result, even today's best estimates still have a large amount of uncertainty associated with them.</p>
         <p>In order to count genes, we need to define what we mean by a 'gene', a term whose meaning has changed dramatically over the past century. For our discussion, we will restrict the definition of gene to a region of the genome that is transcribed into messenger RNA and translated into one or more proteins. When multiple proteins are translated from the same region due to alternative mRNA splicing, we will consider this collection of alternative isoforms to be a single gene. In this respect, our definition of a gene is equivalent to what may also be called a chromosomal locus. We will exclude non-protein-coding RNA genes (such as microRNAs (miRNAs) and small nuclear RNAs (snRNAs)), in part because of the even greater uncertainty surrounding their numbers. In recent years, as a result of the dramatic breakthroughs in our understanding of RNA interference <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and miRNAs <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, the number and variety of known RNA genes has grown dramatically, and we expect that it will be many more years before we have a clear picture of how many of these non-coding genes exist in the human genome.</p>
      </sec>
      <sec>
         <st>
            <p>Estimates based on transcription</p>
         </st>
         <p>With the advent of automated DNA sequencing, it became possible to use sequencing methods to estimate the number of human genes more accurately. The most promising approach, which was used by many groups in the 1990s, was to capture mRNA transcripts in a cell by making use of the polyadenylated (poly(A)) 3' ends. Using poly(T) sequences as primers, researchers could use reverse transcription-polymerase chain reaction (RT-PCR) to capture and sequence large numbers of expressed genes in a cell. At a time when the human genome project was just getting under way, these expressed sequence tags (ESTs) represented a shortcut to capturing the protein-coding genes in the genome <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. In 1995, one of the first large-scale surveys of human genes <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> used this approach to construct 300 complementary DNA (cDNA) libraries from 37 distinct organs and tissues, and constructed 87,983 distinct sequences, many of them assembled from multiple overlapping ESTs. This result was consistent with the NIH/DOE estimate of 100,000 genes in the human genome <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
         <p>In the mid-1990s, a series of papers produced estimates based on ESTs that generally agreed on a human gene count of 50,000 to 100,000 genes (Figure <figr fid="F2">2</figr>). In 1993, Antequera and Bird <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> estimated that the human genome contained 45,000 CpG islands. These are stretches of genomic DNA with a relatively high density of CG dinucleotides. Combining this with their report that 56% of sequenced genes at that time (1993) were associated with CpG islands, they calculated a total human gene count of 80,000. The following year, Fields <it>et al. </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> relied primarily on ESTs to produce an estimate of 64,000 genes, although this estimate relied critically on an uncertain estimate of the 'redundancy' of EST sequence databases, which they guessed to be 50%.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>The trend of human gene number counts together with human genome-related milestones</p>
            </caption>
            <text>
               <p><b>The trend of human gene number counts together with human genome-related milestones</b>. Individual estimates of the human gene count are shown as blue diamonds. The range of estimates at different times is shown by the two vertical blue dotted lines. Note how this range has narrowed in recent years.</p>
            </text>
            <graphic file="gb-2010-11-5-206-2"/>
         </fig>
         <p>These two estimates, 64,000 and 80,000, reduced the expected gene count somewhat, but even in 1994 there was little agreement on which number was closer to the truth <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. In a study that unified physical maps, genetic maps, and the sequence data available at the time, Schuler <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> reported in 1996 that the genome held 50,000 to 100,000 genes, although their mapping effort only captured 16,000.</p>
         <p>In 2000, shortly before the human genome was published, several additional estimates appeared: Roest <it>et al. </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp> estimated 28,000 to 34,000 genes using alignments to pufferfish, and two new EST-based estimates reported 35,000 <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and 57,000 <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> genes. This set the stage for the human genome paper, which was soon to appear.</p>
      </sec>
      <sec>
         <st>
            <p>Methods for identifying human genes</p>
         </st>
         <p>To better understand the source of this continuing uncertainty about the gene count, it is instructive to mention a few of the most significant advances in computational gene prediction. (For a more comprehensive review of gene structure prediction methods, the interested reader can consult several recent reviews <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>.)</p>
         <p>One of the oldest and most reliable ways to identify a gene in a newly sequenced genome is by locating a highly similar protein-coding sequence in another organism. Together with EST and cDNA alignments, gene finding by homology is the first step in all the major annotation pipelines. But even the most thorough EST sequencing projects fail to capture many exons and genes. The discovery of these genes is still dependent, at least in part, on <it>de novo </it>gene finders that only require information inherent in the DNA sequence itself.</p>
         <p>Computational gene recognition began about 30 years ago, when it was observed that statistical analysis could detect differences between protein-coding and non-coding nucleotide sequences <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Early gene-prediction programs attempted to identify relatively few properties of genes, such as the signals around splice sites, and they made simplifying assumptions to make the problem more tractable <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. With the development of gene-finding systems designed to predict any number of complete gene structures transcribed from either strand of the genome, automated methods made a significant step forward. The most successful framework for these systems was the generalized hidden Markov model (GHMM) approach. Thanks to their modularity and to their capability to model variable-length features, GHMMs are well suited to modeling the statistical properties of genes. Genscan <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> was one of the first of these, in 1997, and it was also the first <it>de novo </it>gene predictor to reach 80% exon-level accuracy on a human benchmark set. Despite its performance on coding exons, Genscan's gene-level accuracy (the proportion of genes for which it correctly predicts every exon) on the human genome was only about 10%. One reason for the low gene-level accuracy is that typical human genes contain 5 to 10 exons, and even at 80% accuracy per exon, the likelihood of getting all the exons correct for any particular gene is low.</p>
         <p>Although later gene finders would improve on Genscan's results, the next real leap in accuracy came with the development of comparative gene finders. Comparative gene finders use patterns of conservation between two related species, such as human and mouse, to predict the location and structure of protein-coding genes. They can also use the GHMM framework. The biggest effect of using two genomes at once was to reduce the number of false-positive predictions: using human-mouse alignments, Twinscan <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, a dual-genome gene finder, predicted 25,600 human genes versus 45,000 predicted by Genscan <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
         <p>Until 2007, GHMMs were the dominant framework for <it>de novo </it>gene finders, but this changed when conditional random fields (CRFs), a new class of discriminative models, were introduced as a means of using more than two genomes simultaneously. Unlike GHMMs, which are trained by maximum likelihood to generate sequences statistically similar to actual DNA sequences, CRFs are trained to discriminate between genomic elements of interest in order to maximize annotation accuracy. In addition, they are capable of utilizing external evidence and submodels that are not inherently probabilistic <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Through the use of 11 informant genomes, CONTRAST <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> predicted the exact exon-intron structure of 59% of known human protein-coding genes, compared to 25 to 35% from the best previous methods. This is a very strict measure of accuracy: if even one splice site from a multi-exon gene is incorrect, the entire gene is considered to be wrong. But also note that all <it>de novo </it>methods have a significant false-positive rate, predicting many exons (and genes) that do not appear to be genuine. Pseudogenes are one source of false predictions, although the precise reasons for high false positive rates have never been fully determined.</p>
         <p>Despite a steady increase in accuracy over the years, <it>de novo </it>gene predictors are still not accurate enough to rely on for the definitive human gene list. Much greater gains in accuracy have been made through advances at the level of integrative evidence-based methods, such as those employed by JIGSAW <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. By effectively combining multiple forms of evidence generated from a diverse set of sources, including gene finders, protein sequence alignments, EST and cDNA alignments, and splice-site predictions, JIGSAW's predictions are exactly correct for approximately 75% and partially correct for 97% of human genes <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Similar integrated methods are used to generate the gene lists at Ensembl <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and the National Center for Biotechnological Information (NCBI), which uses the Gnomon system <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>How many genes do we find today?</p>
         </st>
         <p>The release of the draft human genome sequence in 2001 revealed a much lower human gene count than expected <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B34">34</abbr></abbrgrp>. The paper published by the public consortium estimated 30,000 to 40,000 protein-coding genes. This number was in rough agreement with the count in the private consortium's paper, which reported 26,588 protein-coding genes with 'strong' evidence, and an additional 12,000 computationally predicted genes with weaker evidence. Strong evidence included similarity to previously known proteins, homology to another mammal, and EST evidence. Weak genes were those with homology to mouse, but lack of other supporting evidence. After 3 years of detailed finishing work, a much more complete draft genome was published in 2004 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and along with this more complete sequence, the public consortium announced a new, much lower, estimate of human protein-coding genes, only 20,000 to 25,000. This low number - lower even than the model plant <it>Arabidopsis thaliana </it>- was surprising to scientists across a wide range of fields, who had expected that the number of genes to be a measure of organismal complexity. Furthermore, the imprecision of the estimate raised questions about the validity of many predicted genes <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         <p>Although the near-finished human genome sequence now covers 99% of the euchromatic (or gene-containing) genome at 99.999% accuracy, the exact number of human genes is still unknown. The two leading repositories of genome annotation, relied on by most researchers looking for genes, are the databases at Ensembl and NCBI. At present, Ensembl lists 22,619 human protein-coding genes, which is 286 higher than the 22,333 protein-coding genes in NCBI's RefSeq database <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. This Ensembl total excludes 1,002 genes mapped onto alternative MHC regions in chromosome 6. The gene count from NCBI includes all protein-coding genes in RefSeq that either have been manually curated or that have supporting cDNA evidence, and that map onto the current human reference assembly (GRCh37). Another popular resource, the University of California at Santa Cruz (UCSC) genome browser <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, lists 21,814 'known' protein-coding genes <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The 'known' genes list was created by mapping human RefSeq mRNA sequences to the genome.</p>
         <p>In an effort to identify a core set of human genes that are universally agreed upon, the collaborative consensus coding sequence project (CCDS) tracks identical protein annotations that are consistently represented at NCBI, Ensembl, and the UCSC Genome Browser <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. As of January 2010, CCDS contained 18,173 human genes that are shared by all three browsers (counting alternative splice variants, where one gene is represented by two or more loci, it lists 23,739 protein-coding loci). Because CCDS takes an extremely conservative strategy, its gene list represents a lower bound on the total number of human genes. Indeed, in its original incarnation in 2005, it listed only 13,142 genes, and the total has steadily grown since then.</p>
         <p>Currently, the average number of genes listed in the human gene catalogs appears to be somewhere around 22,500, with an uncertainty of around 2,000 genes. One recent report claims that this number is much too high: Clamp <it>et al. </it><abbrgrp><abbr bid="B41">41</abbr></abbrgrp> used a conservation-based method, relying on similarity to the mouse and dog genomes as well as other techniques, to reduce it to about 20,500 'valid' protein-coding genes. They discarded as invalid genes that appeared to be retroposons, pseudogenes, and other miscellaneous artifacts, as well as 'orphan' DNA sequences. These orphans have many features of protein-coding genes, but are not conserved in other mammalian genomes, including those of chimpanzees and macaques. Because there were a relatively large number of orphans compared with the otherwise very small gene differences between humans and chimps, Clamp <it>et al. </it>rejected as implausible the alternative hypothesis that the orphans are human-specific genes.</p>
         <p>Recently, the Mammalian Gene Collection (MGC), a multi-year effort to produce full-length cDNA clones for all human genes, reported the completion of its work <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. This report describes 18,877 human protein-coding genes 'with curated RefSeq transcripts', of which MGC has produced clones for 17,421 (92%). The same report noted that recent efforts using comparative sequence data and computational gene finding, followed by confirmation with RT-PCR, had confirmed 563 distinct genes that were missing from the cDNA-based RefSeq and Vega collections <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> at the time. The MGC also excluded the transcripts of many single-exon genes and genes shorter than 100 amino acids, in order to avoid including pseudogenes, although their own report found that out of a set of 351 'likely' single-exon genes, 198 (57%) were confirmed via RT-PCR <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Thus, although the 18,877 number is substantially lower than the total in Ensembl and RefSeq, at least some of the discrepancy is due to the conservative strategy used to identify protein-coding genes by the MGC.</p>
      </sec>
      <sec>
         <st>
            <p>Novel genes</p>
         </st>
         <p>Comparative genome analysis suggests that the numbers of protein-coding genes are not expected to differ very much from mammal to mammal <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. When new genes arise in a species, most such cases are the result of duplications of previously existing genes, followed by neofunctionalization <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. However, entirely novel genes must arise at some point, although the rate of gene 'birth' is not precisely known. Interestingly, a recent study provides the first evidence for the <it>de novo </it>origin of human protein-coding genes, which evolved from non-coding DNA after the divergence of humans and chimpanzees. In this study, Knowles and McLysaght <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> identified three entirely novel genes, all of which have strong mRNA expression evidence supporting transcription, and peptide matches from proteomics databases supporting translation. The orthologous DNA sequence exists in other primate genomes - chimp, macaque, gorilla, gibbon, and orangutan - but in the other primates, the DNA has disabling mutations that disrupt the reading frame. By extrapolating their findings to the whole human genome, the authors estimate that 18 genes are likely to have arisen <it>de novo </it>in humans since our divergence from chimps.</p>
      </sec>
      <sec>
         <st>
            <p>Different humans have different gene counts</p>
         </st>
         <p>In addition to the ongoing uncertainty about the precise number of protein-coding genes, recent evidence has emerged that makes it clear that different humans have slightly different individual gene sets. A major source of such differences is variation in the number of segmental duplications scattered across the genome. Sebat <it>et al. </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp> looked at 20 individuals for copy-number polymorphisms, and found 70 different genes included in regions with variable copy numbers. Iafrate <it>et al. </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp> found more than 100 gene-containing regions that varied in copy number among individuals. Most recently, Alkan <it>et al. </it><abbrgrp><abbr bid="B48">48</abbr></abbrgrp> estimated, on the basis of three sequenced human genomes, that gene counts vary by 73 to 87 genes between any two individuals.</p>
         <p>In another recent finding, Li <it>et al. </it><abbrgrp><abbr bid="B49">49</abbr></abbrgrp> sequenced and assembled two human genomes, one from Africa and one from Asia, and compared them with the reference human genome at NCBI. They identified around 5 Mb of novel sequence in each of the new genomes, and they estimate that the human 'pangenome', which would include all the DNA of every individual human, should have up to 40 Mb of sequence additional to the reference genome, including an unknown number of genes. This additional potential sequence is 1.3% of the genome, which suggests that the eventual gene count might grow by about that same amount.</p>
      </sec>
      <sec>
         <st>
            <p>So what is the likely answer?</p>
         </st>
         <p>We aligned all human genes from NCBI's RefSeq database to the Ensembl gene set in an attempt to explain the differences, but although the total counts differ by less than 300, there are several thousand genes in each set that do not map cleanly onto the other, many of them representing genes of unknown function. Our personal best guess for the total number of human genes is 22,333, which corresponds to the current gene total at NCBI. We prefer this to the slightly higher Ensembl gene count both because the NCBI annotation is slightly more conservative, and because recent compelling arguments support an even lower gene total <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. This number could easily shrink or grow by 1,000 genes in the near future. However, recent analyses make it clear that even if we agree on a complete list of human genes, any particular individual might be missing some of the genes in that list. The genome sequence is complete enough now (although it is not yet finished) that few new genes are likely to be discovered in the gaps, but it seems likely that more genes remain to be discovered by sequencing more individuals. Additional discoveries are likely to make our best estimates for this basic fact about the human genome continue to move up and down for many years to come.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Carl Kingsford for helpful comments and suggestions on the manuscript. MP and SLS were supported in part by grants R01-LM006845 and R01-GM083873 from the US National Institutes of Health.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A preliminary estimate of the number of human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Vogel</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1964</pubdate>
            <volume>201</volume>
            <fpage>847</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/201847a0</pubid>
                  <pubid idtype="pmpid">14161239</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Chow</snm>
                  <fnm>LT</fnm>
               </au>
               <au>
                  <snm>Gelinas</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Broker</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1977</pubdate>
            <volume>12</volume>
            <fpage>1</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(77)90180-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">902310</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Spliced segments at the 5' terminus of adenovirus 2 late mRNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Berget</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1977</pubdate>
            <volume>74</volume>
            <fpage>3171</fpage>
            <lpage>3175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.74.8.3171</pubid>
                  <pubid idtype="pmcid">431482</pubid>
                  <pubid idtype="pmpid">269380</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>US Department of Health and Human Services, US Department of Energy: Understanding our Genetic Inheritance, The U.S. Human Genome Project: The First Five Years, Fiscal Years 1991-1995.</p>
            </title>
            <url>http://www.ornl.gov/sci/techresources/Human_Genome/project/5yrplan/summary.shtml</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Initial sequencing and analysis of the human genome.</p>
            </title>
            <aug>
               <au>
                  <cnm>The International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35057062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The sequence of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Mural</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Amanatides</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ballew</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Huson</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kodira</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Skupski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gabor Miklos</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Broder</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Nadeau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McKusick</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Zinder</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>291</volume>
            <fpage>1304</fpage>
            <lpage>1351</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1058040</pubid>
                  <pubid idtype="pmpid" link="fulltext">11181995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Potent and specific genetic interference by double-stranded RNA in <it>Caenorhabditis elegans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Fire</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Montgomery</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Kostas</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Driver</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Mello</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>391</volume>
            <fpage>806</fpage>
            <lpage>811</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35888</pubid>
                  <pubid idtype="pmpid" link="fulltext">9486653</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The <it>C. elegans </it>heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Feinbaum</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Ambros</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>75</volume>
            <fpage>843</fpage>
            <lpage>854</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90529-Y</pubid>
                  <pubid idtype="pmpid" link="fulltext">8252621</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Complementary DNA sequencing: expressed sequence tags and human genome project.</p>
            </title>
            <aug>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Dubnick</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Polymeropoulos</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Xiao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Merril</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Olde</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Moreno</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>McCombie</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1991</pubdate>
            <volume>252</volume>
            <fpage>1651</fpage>
            <lpage>1656</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.2047873</pubid>
                  <pubid idtype="pmpid" link="fulltext">2047873</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Fuldner</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Brandon</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Cline</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Earle-Hughes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fine</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>FitzGerald</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Fritchman</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Geoghagen</snm>
                  <fnm>NSM</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gnehm</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Hanna</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Hedblom</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hinkle</snm>
                  <fnm>PS</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Klimek</snm>
                  <fnm>KM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>1995</pubdate>
            <volume>377</volume>
            <fpage>3</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7566079</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A big book of the human genome. Complementary endeavours.</p>
            </title>
            <aug>
               <au>
                  <snm>Goodfellow</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1995</pubdate>
            <volume>377</volume>
            <fpage>285</fpage>
            <lpage>286</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/377285a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">7566079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Number of CpG islands and genes in human and mouse.</p>
            </title>
            <aug>
               <au>
                  <snm>Antequera</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bird</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1993</pubdate>
            <volume>90</volume>
            <fpage>11995</fpage>
            <lpage>11999</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.90.24.11995</pubid>
                  <pubid idtype="pmcid">48112</pubid>
                  <pubid idtype="pmpid">7505451</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>How many genes in the human genome?</p>
            </title>
            <aug>
               <au>
                  <snm>Fields</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1994</pubdate>
            <volume>7</volume>
            <fpage>345</fpage>
            <lpage>346</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng0794-345</pubid>
                  <pubid idtype="pmpid" link="fulltext">7920649</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Predicting the total number of human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Antequera</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bird</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1994</pubdate>
            <volume>8</volume>
            <fpage>114</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1094-114a</pubid>
                  <pubid idtype="pmpid" link="fulltext">7842006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A gene map of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Boguski</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Gyapay</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Tom&#233;</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aggarwal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bajorek</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bentolila</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Castle</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Chiannilkulchai</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cowles</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Dibling</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Drouot</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dunham</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Duprat</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>East</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Fang</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Fizames</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Garrett</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>274</volume>
            <fpage>540</fpage>
            <lpage>546</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.274.5287.540</pubid>
                  <pubid idtype="pmpid" link="fulltext">8849440</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Estimate of human gene number provided by genome-wide analysis using <it>Tetraodon nigroviridis </it>DNA sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Roest Crollius</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Jaillon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bernot</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dasilva</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bouneau</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fischer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fizames</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brottier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Qu&#233;tier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Saurin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>235</fpage>
            <lpage>238</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76118</pubid>
                  <pubid idtype="pmpid" link="fulltext">10835645</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Analysis of expressed sequence tags indicates 35,000 human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Ewing</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>232</fpage>
            <lpage>234</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76115</pubid>
                  <pubid idtype="pmpid" link="fulltext">10835644</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Gene index analysis of the human genome estimates approximately 120,000 genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Liang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pertea</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Karamycheva</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>239</fpage>
            <lpage>240</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76126</pubid>
                  <pubid idtype="pmpid" link="fulltext">10835646</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Steady progress and recent breakthroughs in the accuracy of automated genome annotation.</p>
            </title>
            <aug>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>62</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg2220</pubid>
                  <pubid idtype="pmpid" link="fulltext">18087260</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Identifying protein-coding genes in genomic sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Harrow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nagy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Alioto</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Patthy</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Antonarakis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>201</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2009-10-1-201</pubid>
                  <pubid idtype="pmcid">2687780</pubid>
                  <pubid idtype="pmpid" link="fulltext">19226436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Prediction of genomic functional elements.</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>315</fpage>
            <lpage>338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.7.080505.115745</pubid>
                  <pubid idtype="pmpid" link="fulltext">16824019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>A search for patterns in the nucleotide sequence of the MS2 genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Erickson</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>GG</fnm>
               </au>
            </aug>
            <source>J Math Biol</source>
            <pubdate>1979</pubdate>
            <volume>7</volume>
            <fpage>219</fpage>
            <lpage>230</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF00275725</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The coding function of nucleotide sequences can be discerned by statistical analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Shulman</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Steinberg</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Westmoreland</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>J Theor Biol</source>
            <pubdate>1981</pubdate>
            <volume>88</volume>
            <fpage>409</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-5193(81)90274-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">6456380</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Recognition of protein coding regions in DNA sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1982</pubdate>
            <volume>10</volume>
            <fpage>5303</fpage>
            <lpage>5318</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/10.17.5303</pubid>
                  <pubid idtype="pmcid">320873</pubid>
                  <pubid idtype="pmpid" link="fulltext">7145702</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Computational methods for the identification of genes in vertebrate genomic sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>1997</pubdate>
            <volume>6</volume>
            <fpage>1735</fpage>
            <lpage>1744</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/6.10.1735</pubid>
                  <pubid idtype="pmpid" link="fulltext">9300666</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Integrating genomic homology into gene structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S140</fpage>
            <lpage>S148</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11473003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <aug>
               <au>
                  <snm>Majoros</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Methods for Computational Gene Prediction</source>
            <publisher>Cambridge: Cambridge University Press</publisher>
            <pubdate>2007</pubdate>
         </bibl>
         <bibl id="B29">
            <title>
               <p>CONTRAST: a discriminative, phylogeny-free approach to multiple informant <it>de novo </it>gene prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Gross</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Do</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Sirota</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R269</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2007-8-12-r269</pubid>
                  <pubid idtype="pmcid">2246271</pubid>
                  <pubid idtype="pmpid" link="fulltext">18096039</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>JIGSAW: integration of multiple sources of evidence for gene prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3596</fpage>
            <lpage>3603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti609</pubid>
                  <pubid idtype="pmpid" link="fulltext">16076884</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Majoros</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Pertea</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>Suppl 1</issue>
            <fpage>S9</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2006-7-s1-s9</pubid>
                  <pubid idtype="pmcid">1810558</pubid>
                  <pubid idtype="pmpid" link="fulltext">16925843</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Ensembl's 10th year.</p>
            </title>
            <aug>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aken</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Ballester</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Beal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bragin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clapham</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Coates</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fairley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fitzgerald</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fernandez-Banet</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gordon</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gr&#228;f</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Haider</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hammond</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jenkinson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>K&#228;h&#228;ri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Keefe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Keenan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kinsella</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kokocinski</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Koscielny</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kulesha</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lawson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Longden</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Massingham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>McLaren</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2010</pubdate>
            <issue>38 Database</issue>
            <fpage>D557</fpage>
            <lpage>D562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkp972</pubid>
                  <pubid idtype="pmcid">2808936</pubid>
                  <pubid idtype="pmpid" link="fulltext">19906699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>NCBI Gnomon.</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.shtml</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Initial sequencing and analysis of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Linton</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nusbaum</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Devon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dewar</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Doyle</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Funke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gage</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heaford</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Howland</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kann</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lehoczky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>LeVine</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>McEwan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>McKernan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Meldrim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Miranda</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Naylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Raymond</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rosetti</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Santos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sheridan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sougnez</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35057062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The ENCODE (ENCyclopedia Of DNA Elements) Project.</p>
            </title>
            <aug>
               <au>
                  <cnm>ENCODE Consortium</cnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>636</fpage>
            <lpage>640</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1105136</pubid>
                  <pubid idtype="pmpid" link="fulltext">15499007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Human genome: end of the beginning.</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>915</fpage>
            <lpage>916</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/431915a</pubid>
                  <pubid idtype="pmpid" link="fulltext">15496902</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>NCBI Reference Sequences: current status, policy and new initiatives.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Klimke</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2009</pubdate>
            <issue>37 Database</issue>
            <fpage>D32</fpage>
            <lpage>D36</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkn721</pubid>
                  <pubid idtype="pmcid">2686572</pubid>
                  <pubid idtype="pmpid" link="fulltext">18927115</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>The UCSC Genome Browser.</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Curr Protoc Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>Chapter 1</volume>
            <fpage>Unit 1.4</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">19957273</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>UCSC Genome Table Browser.</p>
            </title>
            <url>http://genome.ucsc.edu/cgi-bin/hgTables</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Harrow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Wallin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Farrell</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Loveland</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ruef</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Suner</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Landrum</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Aken</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ayling</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fernandez-Banet</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dicuccio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Schuster</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shkeda</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Amid</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dukhanina</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Frankish</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2009</pubdate>
            <volume>19</volume>
            <fpage>1316</fpage>
            <lpage>1323</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.080531.108</pubid>
                  <pubid idtype="pmcid">2704439</pubid>
                  <pubid idtype="pmpid" link="fulltext">19498102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Distinguishing protein-coding and noncoding genes in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fry</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kamal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Xie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>19428</fpage>
            <lpage>19433</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0709013104</pubid>
                  <pubid idtype="pmcid">2148306</pubid>
                  <pubid idtype="pmpid" link="fulltext">18040051</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The completion of the Mammalian Gene Collection (MGC).</p>
            </title>
            <aug>
               <au>
                  <cnm>MGC Project Team</cnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2009</pubdate>
            <volume>19</volume>
            <fpage>2324</fpage>
            <lpage>2333</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.095976.109</pubid>
                  <pubid idtype="pmpid" link="fulltext">19767417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Targeted discovery of novel human exons by comparative genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brejov&#225;</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Langton</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Comstock</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ewing</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Oommen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Roe</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gerhard</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Temple</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1763</fpage>
            <lpage>1773</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.7128207</pubid>
                  <pubid idtype="pmcid">2099585</pubid>
                  <pubid idtype="pmpid" link="fulltext">17989246</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The origin of new genes: glimpses from the young and old.</p>
            </title>
            <aug>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Betran</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>865</fpage>
            <lpage>875</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1204</pubid>
                  <pubid idtype="pmpid" link="fulltext">14634634</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Recent <it>de novo </it>origin of human protein-coding genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Knowles</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>McLysaght</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2009</pubdate>
            <volume>19</volume>
            <fpage>1752</fpage>
            <lpage>1759</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.095026.109</pubid>
                  <pubid idtype="pmcid">2765279</pubid>
                  <pubid idtype="pmpid" link="fulltext">19726446</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Large-scale copy number polymorphism in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Sebat</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lakshmi</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Troge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Alexander</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lundin</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>M&#229;n&#233;r</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Massa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Navin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lucito</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Healy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hicks</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Reiner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gilliam</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Trask</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Patterson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Zetterberg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wigler</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>305</volume>
            <fpage>525</fpage>
            <lpage>528</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1098918</pubid>
                  <pubid idtype="pmpid" link="fulltext">15273396</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Detection of large-scale variation in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Iafrate</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Feuk</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rivera</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Listewnik</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Donahoe</snm>
                  <fnm>PK</fnm>
               </au>
               <au>
                  <snm>Qi</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>949</fpage>
            <lpage>951</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1416</pubid>
                  <pubid idtype="pmpid" link="fulltext">15286789</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Personalized copy number and segmental duplication maps using next-generation sequencing.</p>
            </title>
            <aug>
               <au>
                  <snm>Alkan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kidd</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Marques-Bonet</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Aksay</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Antonacci</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hormozdiari</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kitzman</snm>
                  <fnm>JO</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Malig</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mutlu</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sahinalp</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Eichler</snm>
                  <fnm>EE</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2009</pubdate>
            <volume>41</volume>
            <fpage>1061</fpage>
            <lpage>1067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng.437</pubid>
                  <pubid idtype="pmpid" link="fulltext">19718026</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Building the sequence map of the human pan-genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Qian</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ren</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Qin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Blanche</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bolund</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kristiansen</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2010</pubdate>
            <volume>28</volume>
            <fpage>57</fpage>
            <lpage>63</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt.1596</pubid>
                  <pubid idtype="pmpid" link="fulltext">19997067</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.</p>
            </title>
            <aug>
               <au>
                  <cnm>International Chicken Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>432</volume>
            <fpage>695</fpage>
            <lpage>716</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03154</pubid>
                  <pubid idtype="pmpid" link="fulltext">15592404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla.</p>
            </title>
            <aug>
               <au>
                  <snm>Jaillon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Aury</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Noel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Policriti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clepet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Casagrande</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Choisne</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Aubourg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vitulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Jubin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vezzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Legeai</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hugueney</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Dasilva</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Horner</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mica</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Jublot</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Poulain</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bruy&#232;re</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Billault</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Segurens</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gouyvenoux</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ugarte</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cattonaro</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Anthouard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Vico</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Del Fabbro</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Alaux</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Di Gaspero</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dumas</snm>
                  <fnm>V</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>449</volume>
            <fpage>463</fpage>
            <lpage>467</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature06148</pubid>
                  <pubid idtype="pmpid" link="fulltext">17721507</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>