<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-1-r5</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Picoeukaryotic sequences in the Sargasso Sea metagenome</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Piganeau</snm>
               <fnm>Gwenael</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>gwenael.piganeau@obs-banyuls.fr</email>
            </au>
            <au id="A2">
               <snm>Desdevises</snm>
               <fnm>Yves</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>yves.desdevises@obs-banyuls.fr</email>
            </au>
            <au id="A3">
               <snm>Derelle</snm>
               <fnm>Evelyne</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>derelle@obs-banyuls.fr</email>
            </au>
            <au id="A4">
               <snm>Moreau</snm>
               <fnm>Herve</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>h.moreau@obs-banyuls.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>UPMC Univ Paris 06, UMR 7628, MBCE, Observatoire Oc&#233;anologique, F-66651, Banyuls/mer, France</p>
            </ins>
            <ins id="I2">
               <p>CNRS, UMR 7628, MBCE, Observatoire Oc&#233;anologique, F-66651, Banyuls/mer, France</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>R5</fpage>
         <url>http://genomebiology.com/2008/9/1/R5</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18179699</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-1-r5</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>16</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>6</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>7</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>07</day>
               <month>01</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Piganeau et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Picoeukaryote metagenome</p>
      </shorttitle>
      <shortabs>
         <p>Many sequences from picoeukaryotes were found in DNA sequence data assembled from Sargasso seawater.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>With genome sequencing becoming more and more affordable, environmental shotgun sequencing of the microorganisms present in an environment generates a challenging amount of sequence data for the scientific community. These sequence data enable the diversity of the microbial world and the metabolic pathways within an environment to be investigated, a previously unthinkable achievement when using traditional approaches. DNA sequence data assembled from extracts of 0.8 &#956;m filtered Sargasso seawater unveiled an unprecedented glimpse of marine prokaryotic diversity and gene content. Serendipitously, many sequences representing picoeukaryotes (cell size &lt;2 &#956;m) were also present within this dataset. We investigated the picoeukaryotic diversity of this database by searching sequences containing homologs of eight nuclear <it>anchor </it>genes that are well conserved throughout the eukaryotic lineage, as well as one chloroplastic and one mitochondrial gene.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We found up to 41 distinct eukaryotic scaffolds, with a broad phylogenetic spread on the eukaryotic tree of life. The average eukaryotic scaffold size is 2,909 bp, with one gap every 1,253 bp. Strikingly, the AT frequency of the eukaryotic sequences (51.4%) is significantly lower than the average AT frequency of the metagenome (61.4%). This represents 4% to 18% of the estimated prokaryotic diversity, depending on the average prokaryotic versus eukaryotic genome size ratio.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Despite similar cell size, eukaryotic sequences of the Sargasso Sea metagenome have higher GC content, suggesting that different environmental pressures affect the evolution of their base composition.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Genome sequencing is becoming more and more affordable and shotgun sequencing using DNA from environmental microbial communities now provides the scientific community with a challenging amount of sequence data (see <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, for a review). These sequence data enable the diversity of the microbial world and the metabolic pathways within environments to be investigated <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, a previously unthinkable achievement when using traditional approaches, since it has been estimated that 99% of marine microorganisms can not be cultured in the laboratory <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>Picoplankton is defined as a fraction of unicellular organisms having a cell size ranging from 0.2 to 2 or 3 &#956;m <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and is made up of both prokaryotic and eukaryotic cells, which can be either heterotrophic or autotrophic. The ecology of picoplankton has been intensely investigated this past decade and it now appears to play major roles in biogeochemical cycles that occur in oceans, especially in oligotrophic areas <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. At present, the diversity of prokaryotes as studied mainly by PCR 16S rRNA gene based approaches <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, or more recently by random sequencing of filtered sea water <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, is better characterized than that of eukaryotes. For example, in samples collected from the Sargasso Sea, filtered through a pore size of 0.8 &#956;m and randomly sequenced, Proteobacteria, Cyanobacteria and species in the CFB phylum (Cytophaga, Flavobacterium, and Bacteroides) dominated <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, while the presence of eukaryotic sequences was reported but without phylogenetic analysis. Among photosynthetic bacteria, the two genera <it>Prochlorococcus </it>and <it>Synechococcus </it>were clearly dominant, as described for many other areas <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>However, although picoeukaryotes are known to be a minor component of picoplankton in terms of cell number, these organisms, at least those that are photosynthetic, are known to play a major role in primary productivity in oligotrophic areas, where they can represent up to 80% of the autotrophic biomass <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B14">14</abbr></abbrgrp>. Picoeukaryotes usually have a bigger cell volume than prokaryotes, are subject to a high grazing mortality and have a higher growth rate than cyanobacteria. They can be responsible for 75% of net carbon production in some coastal areas <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Picoeukaryote diversity is much less well studied than its prokaryote counterpart, although some work has been done recently <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. It is mainly composed of phyla such as Haptophytes, Dinoflagellates and Prasinophytes, some phylogenetic groups inside these very broad phyla still lacking cytological data <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Some quantitative studies based on <it>in situ </it>hybridization experiments showed that, among these groups, Prasinophytes apparently dominate picoeukaryotes in different oceanic areas, and, more precisely, the genus <it>Micromonas </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. However, many other species are found ubiquitously, even if they usually represent a minority of cells.</p>
         <p>The most ambitious marine metagenomics project is the Global Ocean Survey (GOS), aiming to sequence picoplankton in many locations all over the oceans of the planet <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The pilot project of this study was published three years ago with samples from the Sargasso Sea <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The experimental design used to collect sequence data was geared largely to examining prokaryote diversity and gene content. However, some very small eukaryotes can work their way through the filtration system used (0.8 &#956;m). This is indeed the case in the Sargasso Sea samples, where 34 18S rRNA sequences were identified but not analyzed in detail (Table S5 in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). Among picoeukaryote species or genera that could pass through the filtration cut off used, <it>Ostreococcus </it>is a likely candidate <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. It is a picophytoplankton genus that belongs to Prasinophytes, a group of widespread green algae thought to have diverged very early from the ancestor of all chloroplast-containing green plants and algae. <it>Ostreococcus </it>is so far the smallest eukaryotic cell known (diameter 0.8 &#956;m), and has the smallest currently described genome for a photosynthetic eukaryotic organism <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Here, we analyze the picoeukaryotic sequences present in the Sargasso Sea Database (SSD) to assess the sequence quality, diversity and relative abundance of these organisms and discuss the prospects of this approach for evolutionary genomics.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Homology based approach (BLAST) versus phylogenetic tree reconstruction approach</p>
            </st>
            <p>We used sequence similarity as inferred from BLAST twice, first to retrieve eukaryotic sequences from the SSD and second to infer the taxonomic affiliation of these sequences. To retrieve the eukaryotic scaffolds from the SSD, we used a reference dataset for each gene chosen as an anchor. We used eight eukaryotic nuclear gene 'anchors', that is, well-conserved genes across the eukaryotic tree of life: 18S rRNA, 28S rRNA, and the genes encoding elongation factor 1a (EF1a), elongation factor 2 (EF2), the large subunit of RNA polymerase II (RPB1), actin, &#945;-tubulin and &#946;-tubulin. Since the genes we selected were well conserved among the eukaryotic lineage, we found little variation in the number of hits between the different species contained in each reference dataset. We even retrieved some prokaryotic scaffolds alongside the eukaryotic ones because of distant conservation with the protein coding genes. We are therefore confident we retrieved all eukaryotic scaffolds containing homologs to these genes using this approach. However, the taxonomic affiliation of these scaffolds as inferred from a local alignment approach has several drawbacks and has been found to be more error prone than phylogenetic based taxonomic affiliation <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Usually the blast best hit (BBH) against GenBank is the only way to glean information about taxonomic affiliation from most environmental sequences. The reliability of the affiliation depends on the representation of each taxonomic group in GenBank, but there is a high bias towards sequences from Metazoans in this database, with a bias towards larger organisms in general. To exemplify this, we identified no SSD scaffolds found to contain RPB1 matching with a Chlorophyta RPB1, simply because there are no Chlorophyta RPB1 genes in the GenBank protein database yet. Therefore, the taxonomic affiliation is best described for genes sequenced in a large number of species in a broad range of taxa, such as the rRNA sequences. We also checked the taxonomic affiliation by phylogenetic tree reconstruction for the rRNA sequences (see Additional data files 1 and 2 for the 28S rRNA and 18S rRNA supertrees). The taxonomic affiliation of a SSD scaffold as inferred from its BBH was found to be consistent with the tree topology for all rRNA SSD scaffolds for which phylogenetic position could be resolved, that is, for less than half of the scaffolds (Additional data files 1 and 2). However, reducing information to phylogenetic inference is too restrictive for this kind of highly fragmented sequence data. First, because most of the sequences do not contain enough sites for their phylogenetic position to be fully resolved, and second, because highly variable regions have to be discarded from the global alignment, whereas they may contain most of the information (for example, the Internal Transcribed Spacer sequences between ribosomal genes).</p>
         </sec>
         <sec>
            <st>
               <p>Picoeukaryotic diversity of the Sargasso Sea metagenome</p>
            </st>
            <p>Depending on which gene we searched for, we retrieved 4 (EF2) to 41 (28SrRNA) distinct eukaryotic sequences from the SSD (Table <tblr tid="T1">1</tblr>). This is less than the 69 18S rRNA sequences reported in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> because we analyzed the assembled sequence data deposited in GenBank, which does not contain the sequences obtained from samples 5 to 8 with larger filter sizes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> (up to 20 &#956;m; Table S1 in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). The taxonomic distribution of the sequences, as inferred from BLAST search against GenBank and phylogenetic analysis, is shown in Table <tblr tid="T1">1</tblr>. Despite the small number of sequences, the species diversity covered is impressive, since the five groups of the tree of eukaryotes <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> are represented for three of the eight nuclear genes (18S rRNA, RPB1, actin). The most abundant high blast score hits were found to sequences from the Dinophyceae (four out of the eight nuclear genes studied). This is consistent with previously reported marine picoeukaryotic diversity studies based on hundreds of 18S rRNA sequences from water filtered through larger pore sizes (5 and 3 &#956;m filter pore size in <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B27">27</abbr></abbrgrp>, respectively). The second most abundant group belongs to the Streptophyta-Chlorophyta (green plants) group, as might be expected for samples collected from surface water.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Phylogenetic distribution of the eukaryotic SSD scaffolds</p>
               </caption>
               <tblbdy cols="12">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10" ca="center">
                        <p>Number of SSD sequences</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Supergroup</p>
                     </c>
                     <c ca="left">
                        <p>Group</p>
                     </c>
                     <c ca="center">
                        <p>18S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>28S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>EF1a</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>EF2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>RPB1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>actin</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>&#945;-tubulin</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>&#946;-tubulin</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>cox1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rbcL</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Rhizaria</p>
                     </c>
                     <c ca="left">
                        <p>Cercozoa</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Polycystinea</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chromalveolates</p>
                     </c>
                     <c ca="left">
                        <p>Apicomplexa</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Ciliophora</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Dinophyceae</p>
                     </c>
                     <c ca="center">
                        <p>10* (3)</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Stramenopiles</p>
                     </c>
                     <c ca="center">
                        <p>1 (1)</p>
                     </c>
                     <c ca="center">
                        <p>6 (4)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Excavates</p>
                     </c>
                     <c ca="left">
                        <p>Euglenozoa</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Heterolobosea</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plantae</p>
                     </c>
                     <c ca="left">
                        <p>Chlorophyta</p>
                     </c>
                     <c ca="center">
                        <p>3 (2)</p>
                     </c>
                     <c ca="center">
                        <p>3 (3)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                     <c ca="center">
                        <p>1*</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>1*</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Streptophyta</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2*</p>
                     </c>
                     <c ca="center">
                        <p>7*</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>2*</p>
                     </c>
                     <c ca="center">
                        <p>2*</p>
                     </c>
                     <c ca="center">
                        <p>3*</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Rhodophyta</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unikonts</p>
                     </c>
                     <c ca="left">
                        <p>Ichthyosporea</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1 (1)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Arthropoda</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>7 (5)</p>
                     </c>
                     <c ca="center">
                        <p>1*</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Bryozoa</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1 (1)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Cnidaria</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>2* (1)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Fungi</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Platyhelminthes</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Urochordata</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1 (1)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1 (32)</p>
                     </c>
                     <c ca="center">
                        <p>1 (25)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The number of scaffolds for which taxonomic affiliation was confirmed by phylogenetic analysis (Additional data files 1 and 2) is indicated in brackets. The taxonomic affiliation of the largest scaffold is indicated by an asterisk. The groups for which the anchor gene has no representative in the GenBank database are indicated by x.</p>
               </tblfn>
            </tbl>
            <p>Since the picoeukaryotic world generally comprises cells smaller than 2 to 3 &#956;m <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B27">27</abbr></abbrgrp>, the available SSD enables a glimpse of the smaller part of the picoeukaryotic fraction (cell size between 0.22 and 0.8 &#956;m). It is not surprising, therefore, that larger Prasinophytes, such as <it>Bathycoccus</it>, with a reported cell diameter around 2 &#956;m, were not found in the data set.</p>
            <p>We found two 18S scaffolds and one 28S scaffold matching almost perfectly with an <it>Ostreococcus </it>strain, the smallest photosynthetic picoeukaryotic known so far <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The two SSD 18S rRNA sequences do not overlap and these two sequences could thus belong to the same <it>Ostreococcus</it>, closely related to strain RCC143, consistent with previous analysis <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
            <p>The presence of marine environmental arthropods (BBH is a marine Copepod) and Urochordate sequences (BBH is <it>Ciona</it>) was unexpected, because these organisms are usually much bigger than 0.8 &#956;m. Marine environmental sequences from Copepods (and from Urochordate) have been previously reported in nanoplankton studies (cell size between 2 and 20 &#956;m) but never in picoeukaryotes. Several hypotheses can be proposed to explain the presence of such sequences, one being the presence of gametes or of cell debris from larger organisms. However, even gametes are usually bigger than 0.8 &#956;m and the DNA in cell debris is usually degraded. Another explanation could be the presence of soluble DNA fragments in the Seawater. Finally, a contamination of the filtered batch by non-filtered water cannot be totally ruled out.</p>
            <p>Another ecologically relevant issue is the estimation of the relative abundance of phototrophic versus heterotrophic organisms among these picoeukaryotes. Assuming that all Viridiplantae and half of Dinophyceae are phototrophs <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, we nevertheless find 9.5 phototrophs out of 41, that is less than 24%. This is consistent with a higher observed diversity of heterotrophs than autotrophs in picoplankton, suggesting a complex role of heterotrophs in the microbial food web <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The phylogenetic analysis of the two 18S rRNA <it>Ostreococcus </it>sequences found among the SSD showed that they belong to the deep clade (cladeB in Figure <figr fid="F1">1</figr> from <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>), even though the sea water was collected close to the surface. This observation has also been reported for <it>Prochlorococcus </it>in samples collected from a similar location <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Since the four Sargasso samples making up the SSD were collected during winter deep-water mixing, this may be a possible explanation for the presence of some deep water features of the SSD, as revealed by a recent study of gene content along the water column <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Thus, the occurrence of deep microbial strains in surface waters of the Sargasso Sea can probably be explained by frequent upwelling in this ocean area.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Phylogenetic position of the SSD <it>Ostreococcus</it>-like sequence as inferred from the 18S rRNA sequences in [30]</p>
               </caption>
               <text>
                  <p>Phylogenetic position of the SSD <it>Ostreococcus</it>-like sequence as inferred from the 18S rRNA sequences in [30]. Outgroup sequence, <it>Bathycoccus</it>; OT95, <it>Ostreococcus tauri </it>(clade C); RCC356, RCC344 and MIC106, surface strains (clade A); RCC393 and RCC143, deep strains (clade B); RCC501, surface strain (clade D). Numbers on branches are support values (posterior probability).</p>
               </text>
               <graphic file="gb-2008-9-1-r5-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Picoeukaryotic diversity from other oceanic metagenomes</p>
            </st>
            <p>The SSD represents an unprecedented and yet unique sequencing effort, since it corresponds to the assembly of a total of 1.7 10<sup>6 </sup>reads from four sea water samples from the Sargasso Sea <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. In this pilot study, three other sea water samples have been sequenced in less depth and left unassembled. One of these additional samples, sample 6, used more conventional filter pore sizes to investigate the picoeukaryotic world, 0.8-3 &#956;m, when compared to the 0.22-0.8 &#956;m range used for three of the four SSD samples. Unfortunately, the sequencing effort of sample 6 was only 5% of the total sequencing effort realized to produce the SSD, or 29% of the smallest SSD sample. As a consequence, this sample contained far less eukaryotic material and enabled us to identify 6 (18S) to 11 (28S) additional eukaryotic paired reads, corresponding to Chromalveolates (Additional data files 1 and 2). We also screened seven additional marine metagenomes from the GOS project, corresponding to samples from seven different open ocean locations, for picoeukaryotic content. These metagenomes are part of the GOS survey <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, and sea water was filtered to collect 0.1-0.8 &#956;m sized organisms. We found out that their picoeukaryotic content was almost negligible (from 0 to 4 reads matching a eukaryotic rRNA sequence). There are at least two reasons for this picoeukaryotic scarcity. First, the sequencing effort was much lower for these locations (4-15% of the SSD sequencing effort), thus reducing the overall diversity of the sample. Second, the collection filters used for these metagenomes were smaller (0.1 &#956;m compared to 0.22 &#956;m for the SSD), which also reduces the eukaryotic versus prokaryotic content. The collection filter size seems to have a major effect on picoeukaryotic sampling, since the one SSD sample collected with a 0.1 &#956;m filter has lower picoeukaryotic content than the three other SSD samples collected with a 0.22 &#956;m filter, despite larger sequencing depth (for example, Table S5 in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). Therefore, this study focuses on eukaryotic sequence diversity from the largest metagenome from the Sargasso Sea (SSD).</p>
         </sec>
         <sec>
            <st>
               <p>Picoeukaryotic versus prokaryotic content and sequence features</p>
            </st>
            <p>We retrieved 41 distinct scaffolds containing 28S rRNA sequences and 558 distinct scaffolds containing 16S rRNA from the SSD. Assuming an equal distribution of the number of rRNA repeats in the genomes of Eukaryotes and Prokaryotes, that is, assuming that counting the number of rRNA repeats to estimate species richness is biased in the same way in both Eukaryotes and Prokaryotes, we can estimate the eukaryotic/prokaryotic species number ratio, <it>&#961;</it>, equal to <it>&#961; </it>= 41/558 = 7.3%. The rRNA gene copy number is known to be variable in both prokaryotes <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and picoeukaryotes <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Due to the greater occurrence of duplication in eukaryotic genomes, the number of rRNA copies reached in some eukaryotic species is several orders of magnitudes higher than in prokaryotic species. Thus, the above ratio is likely to be an overestimation. The average number of different eukaryotic SSD scaffolds over the 8 nuclear genes is 20, so it seems more realistic to assume <it>&#961; </it>= 20/558 = 3.7%. However, this is an underestimate because eukaryotic genomes are, on average, larger than prokaryotic ones. Assuming an equal species abundance, the probability of sequencing orthologous regions of 100 bp in two genomes of size G1 = 10 Mb, that is, of the probability of identifying two distinct species, is one order of magnitude lower than the probability of sequencing two orthologous regions of 100 bp in two genomes of size G2 = 1 Mb (equal to the ratio G1/G2). Thus, this ratio must be corrected by the difference in genome size between prokaryotic and eukaryotic organisms. However, this ratio cannot be estimated precisely, but a minimum of five seems realistic (the <it>Ostreococcus</it>/<it>Synechococcus </it>genome size ratio is 12.6/2.4 = 5.25). Thus, assuming a minimum average difference in picoeukaryotic-prokaryotic genome size of 5, <it>&#961; </it>= 3.7 &#215; 5 = 18.5%, which is consistent with recent experimental estimates of relative picoeukaryotic/prokaryotic abundance in surface coastal water <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
            <p>Because some of the anchor genes contained the same SSD scaffolds (for 18S and 28S rRNA, &#945;- and &#946;-tubulin) the total number of distinct eukaryotic scaffolds for all nuclear genes is 128. The nuclear eukaryotic SSD scaffolds have two striking differences to the prokaryotic and organellar scaffolds (Table <tblr tid="T2">2</tblr>). The first difference is that the nuclear scaffolds are, on average, 25% shorter than the prokaryotic and organellar scaffolds (Student test between SSD scaffolds containing16S rRNA and SSD scaffolds containing 18S rRNA, <it>p </it>value &lt; 10<sup>-7</sup>). The shorter length of the eukaryotic nuclear scaffolds can be explained in at least three ways. First it could solely reflect the genome size difference as explained above, since the probability of finding two overlapping sequences and, thus, larger assemblies is smaller for larger genomes. Second, it may also reflect the greater abundance of prokaryotic versus eukaryotic genomes. A greater number of prokaryotic genomes is the direct consequence of a greater number of prokaryotic cells, as estimated experimentally <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, whereas a greater number of organellar genomes could reflect a higher number of genome copies in the organelles compared to the nucleus. Our result suggests that organellar DNA may be present in more copies than nuclear DNA in picoeukaryotes, as observed in the green alga <it>Chlamydomonas </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Third, the shorter length of eukaryotic scaffolds could also be due to different efficiencies in DNA extraction and sequencing between circular and linear DNA, or between sequences of different base composition.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Comparison of the sequence features of the picoeukaryotic scaffolds retrieved from the SSD</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Number of scaffolds</p>
                     </c>
                     <c ca="center">
                        <p>Average length* (bp)</p>
                     </c>
                     <c ca="center">
                        <p>Length* of largest scaffold (Kbp)</p>
                     </c>
                     <c ca="center">
                        <p>Average distance between gap (bp)</p>
                     </c>
                     <c ca="center">
                        <p>Average AT content (%) (minimum-maximum)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>All SSD</p>
                     </c>
                     <c ca="center">
                        <p>232,141</p>
                     </c>
                     <c ca="center">
                        <p>2,165</p>
                     </c>
                     <c ca="center">
                        <p>205.9</p>
                     </c>
                     <c ca="center">
                        <p>920</p>
                     </c>
                     <c ca="center">
                        <p>61.4 (16.4-99.2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>16S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>558</p>
                     </c>
                     <c ca="center">
                        <p>3,942</p>
                     </c>
                     <c ca="center">
                        <p>76.1</p>
                     </c>
                     <c ca="center">
                        <p>1,103</p>
                     </c>
                     <c ca="center">
                        <p>59.9 (38.6-74.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>18S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>2,673</p>
                     </c>
                     <c ca="center">
                        <p>24.3</p>
                     </c>
                     <c ca="center">
                        <p>1,028</p>
                     </c>
                     <c ca="center">
                        <p>51.9 (32.5-71.1)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>28S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                     <c ca="center">
                        <p>1,760</p>
                     </c>
                     <c ca="center">
                        <p>4.3</p>
                     </c>
                     <c ca="center">
                        <p>880</p>
                     </c>
                     <c ca="center">
                        <p>50.5 (34.7-66.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EF1a</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>11,301</p>
                     </c>
                     <c ca="center">
                        <p>86.6</p>
                     </c>
                     <c ca="center">
                        <p>1,518</p>
                     </c>
                     <c ca="center">
                        <p>46 (33.7-67.2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EF2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>4,163</p>
                     </c>
                     <c ca="center">
                        <p>8.8</p>
                     </c>
                     <c ca="center">
                        <p>1,342</p>
                     </c>
                     <c ca="center">
                        <p>41.6 (38.5-44.6)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>RPB1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>1,724</p>
                     </c>
                     <c ca="center">
                        <p>2.7</p>
                     </c>
                     <c ca="center">
                        <p>862</p>
                     </c>
                     <c ca="center">
                        <p>61.9 (34.9-76.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>actin</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>2,844</p>
                     </c>
                     <c ca="center">
                        <p>17.2</p>
                     </c>
                     <c ca="center">
                        <p>910</p>
                     </c>
                     <c ca="center">
                        <p>41.5 (30.1-60.3)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>&#945;-tubulin</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>1,986</p>
                     </c>
                     <c ca="center">
                        <p>6.7</p>
                     </c>
                     <c ca="center">
                        <p>864</p>
                     </c>
                     <c ca="center">
                        <p>48.6 (31.3-73.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>&#946;-tubulin</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1,887</p>
                     </c>
                     <c ca="center">
                        <p>6.7</p>
                     </c>
                     <c ca="center">
                        <p>832</p>
                     </c>
                     <c ca="center">
                        <p>46.2 (30.5-73.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>All nuclear</p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>2,910</p>
                     </c>
                     <c ca="center">
                        <p>86.6</p>
                     </c>
                     <c ca="center">
                        <p>1,253</p>
                     </c>
                     <c ca="center">
                        <p>51.4 (30.1-76.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>cox1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>3,962</p>
                     </c>
                     <c ca="center">
                        <p>19.4</p>
                     </c>
                     <c ca="center">
                        <p>1,230</p>
                     </c>
                     <c ca="center">
                        <p>66.3 (58.6-70.5)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>rbcL</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>4,173</p>
                     </c>
                     <c ca="center">
                        <p>11.3</p>
                     </c>
                     <c ca="center">
                        <p>2,033</p>
                     </c>
                     <c ca="center">
                        <p>64.7 (59.4-67.6)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Excluding gaps.</p>
               </tblfn>
            </tbl>
            <p>The second difference is that the AT content of the SSD eukaryotic scaffolds we retrieved is much lower than the average AT content of the SSD (51.4% versus 61.4%; Student test, <it>p </it>value &lt; 10<sup>-15</sup>; Figure <figr fid="F2">2</figr>). The few eukaryotic sequences we retrieved from the seven GOS open ocean locations also have a lower AT content (52.2%) than the AT content observed in these metagenomes <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. To test whether this observation is a consequence of a GC biased anchor dataset, we compared the base composition of our anchor dataset to the average GC content in the two complete picoeukaryotic genomes of <it>Ostreococcus tauri </it>and <it>Cyanidioschyzon merolae</it>. The base composition of the eight nuclear anchor genes is actually AT biased in <it>O. tauri </it>(<it>n </it>= 8, <it>f</it><sub><it>AT </it></sub>= 45.0% versus <it>n </it>= 7166, <it>f</it><sub><it>AT </it></sub>= 39.6, <it>p </it>value = 0.003) and not significantly different from the average AT content of the genes in <it>C</it>.<it>merolae </it>(<it>n </it>= 8, <it>f</it><sub><it>AT </it></sub>= 44.4, <it>n </it>= 6699, <it>f</it><sub><it>AT </it></sub>= 44.7, <it>p </it>value = 0.79). Foerstner and colleagues <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> argued that the environment shapes the nucleotide composition of genomes because the Sargasso Sea prokaryotic sequences have a higher AT content than sequences from other environments, though the causes responsible for this compositional bias are not clear yet.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>AT frequency distribution in the 128 eukaryotic SSD scaffolds retrieved (white bars) versus AT frequency distribution in the total SSD scaffolds (black bars)</p>
               </caption>
               <text>
                  <p>AT frequency distribution in the 128 eukaryotic SSD scaffolds retrieved (white bars) versus AT frequency distribution in the total SSD scaffolds (black bars).</p>
               </text>
               <graphic file="gb-2008-9-1-r5-2"/>
            </fig>
            <p>We compared the AT composition of the SSD eukaryotic sequences with the AT composition of their GenBank BBH and found no trend in average base composition differences on the alignments (exact test on the difference of AT content between each pair of sequences, <it>p </it>value = 0.93, <it>n </it>= 128); restricting the comparison to non-marine BBH was not significant either (<it>p </it>value = 0.83, <it>n </it>= 30). We also compared the AT composition of 30 of the 128 eukaryotic SSD scaffolds having a blast hit against the soil metagenome (e-value &lt; 10<sup>-6</sup>) <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and found no significant difference in base composition over the alignments (<it>p </it>value = 0.76, <it>n </it>= 30).</p>
            <p>Shorter genome sizes and the higher cost of synthesis of G and C compared to A or T nucleotides have been invoked as possible explanations for base composition differences between genomes, because of their indirect influence on growth rate <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Global environmental features (nutrient availability, organism density, ecosystem complexity) may induce different pressures on growth rates and, thus, on genomic base composition <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. This analysis suggests that base composition in picoeukaryotes is not subjected to the same selective or neutral forces as prokaryotic sequences in the Sargasso Sea.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We have shown that the SSD contains genomic data from at least 41 eukaryotes with cell sizes below 0.8 &#956;m, with representatives in the five supergroups of the eukaryote tree of life. This represents 4-18% of the prokaryotic diversity of this dataset, in agreement with recent experimental estimates in surface water <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. We cannot rule out the hypothesis that some of these sequences come from larger organisms that have contaminated some of the water samples.</p>
         <p>Also, the assembly of environmental sequences is a great methodological challenge and erroneous assembly may lead to an over- or under-estimation of this number of distinct species. However, this is unlikely for the SSD eukaryotic data we retrieved, because the eukaryotic scaffolds are very short (of the size of the anchor genes) and most of them are 'mini-scaffolds' (consisting of a read and its mate-pair, as described in the supplementary information in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>).</p>
         <p>Overall, the eukaryotic scaffolds were shorter than the prokaryotic ones, which is consistent with larger genome sizes and/or lower cell numbers for picoeukaryotes, and they have a lower AT content. These sequence data contribute information for studying evolutionary genomics in marine picoeukaryotes.</p>
         <p>Most questions in evolutionary genomics need either a complete genome or a representative subset of it. With the sequence of one organism, we can address such issues as the evolution of codon usage bias, the evolution of base composition variation, the dynamics of duplication or the dynamics of transposable elements. With several genomes sequenced from different phylogenetically related species, we can tackle similar issues but from a phylogenetic perspective (for example, which genomic process took place before or after the speciation event). We can also compare homologous sequences from two species to detect positive selection on amino-acid composition <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> or putative regulatory sequences of gene expression by phylogenetic footprinting <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. However, the distinction between orthologous sequences (descending from a common ancestor by speciation) and paralogous sequences (descending from a common ancestor by duplication) is essential for evolutionary genomics <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. This kind of information can be obtained only from a well-annotated complete genome and not from the fragmented and highly gapped environmental sequence data. However, environmental sequences such as those of the Sargasso Sea can provide precious additional data for evolutionary genomics provided that a complete genome is already available. This will soon be the case within the class of Prasinophyceae (Chlorophyta) since seven genome projects are underway: three <it>Ostreococcus</it>, three <it>Micromonas </it>and one <it>Bathycoccus</it>. For example, 13% of the 8,166 annotated coding sequences of <it>O. tauri</it>'s genome <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> match with high blast scores against the SSD (score > 105 and E-value &lt; 10<sup>-26</sup>), and 41% of these scaffolds contain synteny groups with up to seven genes in the same order and orientation in both the SSD scaffold and <it>O</it>.<it>tauri</it>'s genome <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. This metagenomic data could also be used to improve a genome assembly by bridging a gap between two genes, provided that the genomic coverage of the species is high enough in the SSD.</p>
         <p>Another potential crucial output of metagenomes is the retrieval of new, mainly free-living, eukaryotic sequences. This could have outstanding significance for phylogenetic studies, and help to resolve the deep branches of the eukaryotic tree of life by providing sequences from missing links <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. It is striking that the Sargasso Sea data, despite a relatively small number of different species for the same gene, contains such amazing phylogenetic spread, with representatives from the five branches of the eukaryotic tree of life <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Since the analysis unit of a metagenome is an assembled sequence with no more information on the organism, we need assemblies to be as long and reliable as possible to provide maximum phylogenetic information (maximum number of genes) for each organism sequenced. Unfortunately, the assembly of sequences from metagenomes is a great methodological challenge <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> and the average length of a SSD picoeukaryotic sequence is the average size of a gene, that is, around 2,000 bp for rRNA. The development of phylogenetic methods to deal with partial alignments (supertrees) enables phylogenetic inference from gapped data (for example, see references in <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B44">44</abbr></abbrgrp>), thus partly overcoming this problem.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Specific environmental sequencing efforts addressing more specifically picoeukaryotes are needed, with less emphasis on prokaryotes. This would enable better coverage and, thus, larger assemblies of eukaryotic genomes. The objective of the Sargasso Sea environmental sequencing was clearly to obtain prokaryotic sequences and this was done by using a very small filter porosity, sieving organisms of between 0.22 and 0.8 &#956;m. The simplest way to improve the representation of picoeukaryotes in a metagenome would be to shift the filtration range to between 0.5 and 2 &#956;m and increase the sequencing effort to a minimum of one million reads. This would eliminate a large fraction of the prokaryotes and would increase the proportion of picoeukaryotes present in the water sample.</p>
      </sec>
      <sec>
         <st>
            <p>Material and methods</p>
         </st>
         <sec>
            <st>
               <p>Data</p>
            </st>
            <p>The SSD sequence data was retrieved from GenBank (accession number <ext-link ext-link-type="gen" ext-link-id="AACY01000000">AACY01000000</ext-link>, Locus <ext-link ext-link-type="gen" ext-link-id="CH004737">CH004737</ext-link> to <ext-link ext-link-type="gen" ext-link-id="CH236877">CH236877</ext-link>). These sequence data are the database of scaffolds not associated with any particular organism. It was obtained from samples 1-4, prefiltered through 0.8 &#956;m and collected on one 0.1 and three 0.22 &#956;m filters (Table S1 in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). The reads corresponding to this assembly, the reads obtained from sample 6, prefiltered through 3 &#956;m and collected on 0.8 &#956;m filters, and the reads corresponding to the seven other open ocean locations were downloaded from the CAMERA database <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. The <it>O. tauri </it>gene content was retrieved from GenBank (accession numbers <ext-link ext-link-type="gen" ext-link-id="CR954201">CR954201</ext-link>-<ext-link ext-link-type="gen" ext-link-id="CR954220">CR954220</ext-link>).</p>
            <p>To assess picoeukaryotic diversity, we used eight eukaryotic nuclear gene 'anchors', that is, well-conserved genes across the eukaryotic tree of life: 18S rRNA, 28S rRNA, and genes encoding EF1a, EF2, RPB1, actin, &#945;-tubulin and &#946;-tubulin. For each of the six nuclear protein coding genes, we retrieved the seven corresponding genes from the KOG database <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, corresponding to the genes of <it>Arabidopsis thaliana</it>, <it>Caenorhabditis elegans</it>, <it>Drosophila melanogaster</it>, <it>Homo sapiens</it>, <it>Saccharomyces cerevisiae</it>, <it>Schizosaccharomyces pombe</it>, <it>Encephalitozoon cuniculi </it>and the corresponding <it>O. tauri </it>gene. We then extended each reference dataset by searching GenBank for representatives of these genes in each of the supergroups of the eukaryotic tree of life <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. The total number of genes in each dataset was 17 (EF1a), 15 (EF2), 21 (RPB1), 22 (actin), 20 (&#945;-tubulin) and 20 (&#946;-tubulin).</p>
            <p>We also used one chloroplast gene, that encoding the large subunit of ribulose carboxylase (<it>rbcL</it>), and one mitochondrial gene, that encoding the first subunit of cytochrome oxydase (<it>cox1</it>). For each gene, we retrieved 21 and 31 genes from GenBank, respectively, randomly sampling representatives in each of the five supergroups of the eukaryotic tree of life.</p>
            <p>To assess prokaryotic diversity on the same dataset, we used 16S rRNA. The reference dataset for 16S rRNA was retrieved from the RDPII database <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> and contained 4,409 sequences. We randomly chose one sequence for each sequence sharing the same taxonomic affiliation (given by the first name of the organism, for example, <it>Persephonella</it>), which reduced the number of sequences to 906.</p>
            <p>The reference datasets for 18S and 28S RNA were retrieved from GenBank using the ACNUC retrieval system <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> excluding sequences from metazoans. As for the 16S rRNA dataset we randomly chose one sequence when several organisms shared the same taxonomic affiliation. We thus obtained a reference dataset of 252 18S and 246 28S sequences.</p>
         </sec>
         <sec>
            <st>
               <p>Picoeukaryotic diversity and abundance</p>
            </st>
            <p>To assess the diversity and abundance of picoeukaryotes in this dataset, we performed a BLAST search <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> of the ten eukaryotic 'anchor' genes against the SSD, blastn for RNA and tblastn for proteins. We retrieved all Sargasso Sea scaffolds matching these genes with E-values smaller than 10<sup>-14 </sup>for blastn and 10<sup>-7 </sup>for tblastn. We then retrieved these SSD scaffolds and performed a BLAST search against GenBank for taxonomic affiliation. We used blastn against GenBank for scaffolds containing one of the two rRNA genes, and blastx against GenBank's protein database for the scaffolds containing one of the eight 'anchor' protein genes. We deduced the taxonomic affiliation of the environmental sequence from the taxonomic affiliation of the BBH when the E-value of the BBH was smaller than 10<sup>-18 </sup>(blastn) and 10<sup>-10 </sup>(blastx). Otherwise, we considered it as unknown.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogeny of rRNA SSD scaffolds</p>
            </st>
            <p>The SSD scaffolds matching a gene of the anchor 18S and 28S datasets, the corresponding anchor gene and the GenBank BBH, were aligned by MAFFT version 5 <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp> and the alignment was checked by eye with Se-Al v2.0a11 <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. Ambiguous regions were deleted from the alignment, for a final length of 3,045 bp for the 28S rRNA dataset (90 sequences in total) and 1,374 bp for the 18S rRNA dataset (61 sequences in total).</p>
            <p>Most SSD scaffolds are of different sizes, together covering almost all 18S and/or 28S rRNA. These sequence length differences made it difficult to recontruct a phylogenetic tree directly from the whole matrix of aligned sequences. Thus, overlapping subsets of sequences were defined for the maximum possible number of species, given that the aligned sequences were long enough to reconstruct well-supported phylogenetic trees. The trees issued from these datasets will hereafter be named 'subtrees'. They were reconstructed by Bayesian analysis with MrBayes 3.1.2 <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. The reconstruction used four chains of 10<sup>6 </sup>generations with the best evolutionary models chosen via hierarchical likelihood ratio test by MrModelTest 2.2 <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp> (the MrModeltest 2.2 program is distributed by the author, Evolutionary Biology Centre, Uppsala University). The Burnin value was set to 20% of the sampled trees (1% of the number of generations) and only clades with at least 90% posterior probability support were kept as conservative estimates in the final consensus tree. Thirty-one subtrees (28S rRNA) and 23 subtrees (18S rRNA) were constructed.</p>
            <p>All subtrees were combined in a supertree with the use of RadCon <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>, using matrix representation with parsimony with the Baum <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> and Ragan <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> coding scheme <abbrgrp><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr></abbrgrp>. The combined matrix was subjected to a parsimony analysis with the heuristic algorithm implemented in PAUP* <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, using 500 random addition replicates and the tree bisection-reconnection branch-swapping algorithm, holding a maximum of 1,000 trees for each replicate. The 498,000 (28S) and 423,000 (18S) most parsimonious trees obtained were combined in a majority-rule consensus. Supertrees computed from subtrees obtained via Bayesian analysis and maximum likelihood were not significantly different (<it>p </it>&lt; 0.01, symmetric-difference test <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>, computed with PAUP* 4.0), and only supertrees computed from Bayesian inferred subtrees are presented. To assign a SSD scaffold to a taxonomic group, the branch support of this sequence within a taxonomic group had to be over 80%; otherwise, we assumed that the taxonomic affiliation of the SSD scaffold was unresolved by the supertree topology.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic position of the SSD <it>Ostreococcus </it>like 18S sequence</p>
            </st>
            <p>The 18S rRNA sequences from several <it>Ostreococcus </it>strains <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and the corresponding first blast hit of the <it>O</it>.<it>tauri </it>18S on the SSD were aligned manually. This alignment was used to build a phylogenetic tree by Bayesian analysis with MrBayes 3.1.1 <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. The reconstruction used four chains of 10<sup>6 </sup>generations with the best evolutionary models chosen via hierarchical likelihood ratio test by MrModelTest 2.2 <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The best model was Hasegawa-Kishino-Yano (HKY+&#915;) for 18S rRNA. Several analyses were independently run from random trees and to assess convergence. The tree was rooted using related prasinophyte taxa: <it>Bathycoccus</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence analysis</p>
            </st>
            <p>For each SSD scaffold, we computed the length; the number of gaps, the distance between gaps and the base composition using home made computer programs (C language). Statistical analysis was performed with R software <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>.</p>
            <p>To compare the AT frequency between the SSD scaffolds and the AT frequency of the corresponding BBH, we derived the variance, <it>V</it>, of the average of the difference in AT frequency between the two sequences, <it>M</it>. Under the null hypothesis of no difference in AT composition, <it>M </it>follows a normal distribution of mean 0 and variance <it>V</it>:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2008-9-1-r5-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>V</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:msup>
                                    <m:mi>n</m:mi>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                              </m:mrow>
                           </m:mfrac>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>n</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>f</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:msub>
                                          <m:mi>f</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>+</m:mo>
                                       <m:mi>f</m:mi>
                                       <m:msub>
                                          <m:mo>'</m:mo>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>f</m:mi>
                                       <m:msub>
                                          <m:mo>'</m:mo>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>k</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGwbGaeyypa0tcfa4aaSaaaeaacaaIXaaabaGaamOBamaaCaaabeqaaiaaikdaaaaaaOWaaabCaKqbagaadaWcaaqaaiaadAgadaWgaaqaaiaadMgaaeqaaiaacIcacaaIXaGaeyOeI0IaamOzamaaBaaabaGaamyAaaqabaGaaiykaiabgUcaRiaadAgacaGGNaWaaSbaaeaacaWGPbaabeaacaGGOaGaaGymaiabgkHiTiaadAgacaGGNaWaaSbaaeaacaWGPbaabeaacaGGPaaabaGaam4AamaaBaaabaGaamyAaaqabaaaaaWcbaGaamyAaiabg2da9iaaigdaaeaacaWGUbaaniabggHiLdaaaa@4DFB@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>with <it>n </it>the number of SSD scaffolds used, <it>k</it><sub><it>i </it></sub>the length of the alignment over which the AT frequencies of the SSD scaffold, <it>f</it><sub><it>i</it></sub>, and the corresponding BBH, <it>f'</it><sub><it>i</it></sub>, was computed.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>BBH, best blast hit; EF, elongation factor; GOS, Global Ocean Survey; RPB1, large subunit of RNA polymerase II; SSD, Sargasso Sea Database.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GP designed the study and performed data analysis. YD performed phylogenetic analysis. ED provided <it>Ostreococcus </it>sequences and helped with data analysis. GP and HM wrote the paper. All authors have read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> shows the supertree of 28S rRNA, a consensus of 498,000 trees. Additional data file <supplr sid="S2">2</supplr> is the supertree of 18S rRNA, a consensus of 423,000 trees. Additional data file <supplr sid="S3">3</supplr> is a table listing the models chosen for each subtree with ModelTest.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Supertree of 28S rRNA, a consensus of 498,000 trees</p>
            </caption>
            <text>
               <p>Supertree of 28S rRNA, a consensus of 498,000 trees.</p>
            </text>
            <file name="gb-2008-9-1-r5-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Supertree of 18S rRNA, a consensus of 423,000 trees</p>
            </caption>
            <text>
               <p>Supertree of 18S rRNA, a consensus of 423,000 trees.</p>
            </text>
            <file name="gb-2008-9-1-r5-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Models chosen for each subtree with ModelTest</p>
            </caption>
            <text>
               <p>Models chosen for each subtree with ModelTest.</p>
            </text>
            <file name="gb-2008-9-1-r5-S3.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by the Centre National de la Recherche Scientifique and the Universit&#233; Pierre et Marie-Curie (Paris VI). We would like to thank Nigel Grimsley for insightful comments and Sebastien Gourbiere for statistical expertise. We are grateful to Yves van de Peer's Bioinformatics and Evolutionary Genomics lab at Ghent University for their work on the gene annotation of <it>O</it>.<it>tauri </it>(special thanks to Stephan Rombauts, Steven Robens and Pierre Rouze) and access to computing facilities. The work presented here was conducted within the framework of the 'Marine Genomics Europe' European Network of Excellence (2004-2008) (GOCE-CT-2004-505403).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Metagenomics -- the key to the uncultured microbes.</p>
            </title>
            <aug>
               <au>
                  <snm>Streit</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Schmitz</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>7</volume>
            <fpage>492</fpage>
            <lpage>498</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.mib.2004.08.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15451504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Metagenomics for studying unculturable microorganisms: cutting the Gordian knot.</p>
            </title>
            <aug>
               <au>
                  <snm>Schloss</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>229</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1273625</pubid>
                  <pubid idtype="pmpid" link="fulltext">16086859</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-8-229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Comparative metagenomics of microbial communities.</p>
            </title>
            <aug>
               <au>
                  <snm>Tringe</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>HW</fnm>
               </au>
               <au>
                  <snm>Podar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Short</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Mathur</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Detter</snm>
                  <fnm>JC</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>554</fpage>
            <lpage>557</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1107851</pubid>
                  <pubid idtype="pmpid" link="fulltext">15845853</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Metagenomic marine nitrogen fixation -- feast or famine?</p>
            </title>
            <aug>
               <au>
                  <snm>Johnston</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ogilvie</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>13</volume>
            <fpage>416</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tim.2005.07.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">16043354</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Quantitative phylogenetic assessment of microbial communities in diverse environments.</p>
            </title>
            <aug>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Raes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tringe</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Doerks</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2007</pubdate>
            <volume>315</volume>
            <fpage>1126</fpage>
            <lpage>1130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1133420</pubid>
                  <pubid idtype="pmpid" link="fulltext">17272687</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Whither microbiology? Phylogenetic trees.</p>
            </title>
            <aug>
               <au>
                  <snm>Woese</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>1060</fpage>
            <lpage>1063</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(02)70664-7</pubid>
                  <pubid idtype="pmpid">8805350</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Primary production of prochlorophytes, cyanobacteria, and eukaryotic ultraphytoplankton: Measurements from flow cytometric sorting.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Limnol Oceanogr</source>
            <pubdate>1994</pubdate>
            <volume>39</volume>
            <fpage>169</fpage>
            <lpage>175</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The importance of Prochlorococcus to community struc- ture in the central North Pacific Ocean. chlorococcus sp. (Prochlorophyta) strains isolated from the North Atlantic and the Mediterranean Sea. Plant.</p>
            </title>
            <aug>
               <au>
                  <snm>Campbell</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Nolla</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Limnol Oceanogr</source>
            <pubdate>1994</pubdate>
            <volume>39</volume>
            <fpage>954</fpage>
            <lpage>961</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Rocap</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Distel</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Waterbury</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>68</volume>
            <fpage>1180</fpage>
            <lpage>1191</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">123739</pubid>
                  <pubid idtype="pmpid" link="fulltext">11872466</pubid>
                  <pubid idtype="doi">10.1128/AEM.68.3.1180-1191.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Genetic diversity in Sargasso Sea bacterioplankton.</p>
            </title>
            <aug>
               <au>
                  <snm>Giovannoni</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Britschgi</snm>
                  <fnm>TB</fnm>
               </au>
               <au>
                  <snm>Moyer</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Field</snm>
                  <fnm>KG</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1990</pubdate>
            <volume>345</volume>
            <fpage>60</fpage>
            <lpage>63</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/345060a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">2330053</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Kinetic bias in estimates of coastal picoplankton community structure obtained by measurements of small-subunit rRNA gene PCR amplicon length heterogeneity.</p>
            </title>
            <aug>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rappe</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Giovannoni</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>1998</pubdate>
            <volume>64</volume>
            <fpage>4522</fpage>
            <lpage>4529</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">106679</pubid>
                  <pubid idtype="pmpid" link="fulltext">9797317</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Environmental genome shotgun sequencing of the Sargasso Sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>66</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1093857</pubid>
                  <pubid idtype="pmpid" link="fulltext">15001713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Prochlorococcus, a marine photosynthetic prokaryote of global significance.</p>
            </title>
            <aug>
               <au>
                  <snm>Partensky</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hess</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>1999</pubdate>
            <volume>63</volume>
            <fpage>106</fpage>
            <lpage>127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98958</pubid>
                  <pubid idtype="pmpid" link="fulltext">10066832</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Assessing the dynamics and ecology of marine picophytoplankton: The importance of the eukaryotic component.</p>
            </title>
            <aug>
               <au>
                  <snm>Worden</snm>
                  <fnm>AZ</fnm>
               </au>
               <au>
                  <snm>Nolan</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Palenik</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Limnology And Oceanography</source>
            <pubdate>2004</pubdate>
            <volume>49</volume>
            <fpage>168</fpage>
            <lpage>179</lpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Are autotrophs less diverse than heterotrophs in marine picoplankton?</p>
            </title>
            <aug>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Romari</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Not</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>10</volume>
            <fpage>266</fpage>
            <lpage>267</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(02)02366-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12088659</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The molecular ecology of microbial eukaryotes unveils a hidden world.</p>
            </title>
            <aug>
               <au>
                  <snm>Moreira</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lopez-Garcia</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>10</volume>
            <fpage>31</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(01)02257-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">11755083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Diversity of picoplanktonic prasinophytes assessed by direct nuclear SSU rDNA sequencing of environmental samples and novel isolates retrieved from oceanic and coastal marine ecosystems.</p>
            </title>
            <aug>
               <au>
                  <snm>Guillou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eikrem</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Chretiennot-Dinet</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Le Gall</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Massana</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Romari</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pedros-Alio</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Protist</source>
            <pubdate>2004</pubdate>
            <volume>155</volume>
            <fpage>193</fpage>
            <lpage>214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1078/143446104774199592</pubid>
                  <pubid idtype="pmpid">15305796</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Diversity of free-living prokaryotes from a deep-sea site at the Antarctic Polar Front.</p>
            </title>
            <aug>
               <au>
                  <snm>Lopez-Garcia</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lopez-Lopez</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Moreira</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Valera</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Ecol</source>
            <pubdate>2001</pubdate>
            <volume>36</volume>
            <fpage>193</fpage>
            <lpage>202</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11451524</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity.</p>
            </title>
            <aug>
               <au>
                  <snm>Moon-van der Staay</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>De Wachter</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>607</fpage>
            <lpage>610</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35054541</pubid>
                  <pubid idtype="pmpid" link="fulltext">11214317</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A single species, Micromonas pusilla (Prasinophyceae), dominates the eukaryotic picoplankton in the Western English Channel.</p>
            </title>
            <aug>
               <au>
                  <snm>Not</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Latasa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marie</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cariou</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>70</volume>
            <fpage>4064</fpage>
            <lpage>4072</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">444783</pubid>
                  <pubid idtype="pmpid" link="fulltext">15240284</pubid>
                  <pubid idtype="doi">10.1128/AEM.70.7.4064-4072.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific.</p>
            </title>
            <aug>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern Aaron</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heidelberg Karla</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen Jonathan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm/>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biology</source>
            <pubdate>2007</pubdate>
            <volume>e77</volume>
            <fpage>3</fpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Dna libraries for sequencing the genome of Ostreococcus tauri (Chlorophyta, Prasinophyceae): The smallest free-living eukaryotic cell.</p>
            </title>
            <aug>
               <au>
                  <snm>Derelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ferraz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lagoda</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Eychenie</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Regad</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Sabau</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Courties</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Delseny</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Demaille</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Journal Of Phycology</source>
            <pubdate>2002</pubdate>
            <volume>38</volume>
            <fpage>1150</fpage>
            <lpage>1156</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1046/j.1529-8817.2002.02021.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Smallest eukaryotic organism.</p>
            </title>
            <aug>
               <au>
                  <snm>Courties</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vaquer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Troussellier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lautier</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chretiennot-Dinet</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Neveux</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Machado</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Claustre</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1994</pubdate>
            <volume>370</volume>
            <fpage>255</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1038/370255a0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A New Marine Picoeucaryote - Ostreococcus Tauri Gen Et Sp-Nov (Chlorophyta, Prasinophyceae).</p>
            </title>
            <aug>
               <au>
                  <snm>Chretiennot-Dinet</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Courties</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vaquer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Neveux</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Claustre</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lautier</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Machado</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Phycologia</source>
            <pubdate>1995</pubdate>
            <volume>34</volume>
            <fpage>285</fpage>
            <lpage>292</lpage>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Taking metagenomic studies in context.</p>
            </title>
            <aug>
               <au>
                  <snm>Remington</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>13</volume>
            <fpage>404</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tim.2005.07.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">16039858</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The tree of eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Keeling</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Burger</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Durnford</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Lang</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Pearlman</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Roger</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>Trends Ecol Evol</source>
            <pubdate>2005</pubdate>
            <volume>20</volume>
            <fpage>670</fpage>
            <lpage>676</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tree.2005.09.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">16701456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton.</p>
            </title>
            <aug>
               <au>
                  <snm>Lopez-Garcia</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Valera</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pedros-Alio</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Moreira</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>603</fpage>
            <lpage>607</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35054537</pubid>
                  <pubid idtype="pmpid" link="fulltext">11214316</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Picoeukaryote diversity in coastal waters of the Pacific Ocean.</p>
            </title>
            <aug>
               <au>
                  <snm>Worden</snm>
                  <fnm>AZ</fnm>
               </au>
            </aug>
            <source>Aquatic Microbial Ecology</source>
            <pubdate>2006</pubdate>
            <volume>43</volume>
            <fpage>165</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubid idtype="doi">10.3354/ame043165</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Morphostasis in alveolate evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Leander</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Keeling</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Trends In Ecology &amp; Evolution</source>
            <pubdate>2003</pubdate>
            <volume>18</volume>
            <fpage>395</fpage>
            <lpage>402</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0169-5347(03)00152-6</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Ecotype diversity in the marine picoeukaryote Ostreococcus (Chlorophyta, Prasinophyceae).</p>
            </title>
            <aug>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Derelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guillou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Le Gall</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Vaulot</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Environmental Microbiology</source>
            <pubdate>2005</pubdate>
            <volume>7</volume>
            <fpage>853</fpage>
            <lpage>859</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1462-2920.2005.00758.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15892704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Rapid quantification of the toxic alga Prymnesium parvum in natural samples by use of a specific monoclonal antibody and solid-phase cytometry.</p>
            </title>
            <aug>
               <au>
                  <snm>West</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Bacchieri</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tomas</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lebaron</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>72</volume>
            <fpage>860</fpage>
            <lpage>868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1352178</pubid>
                  <pubid idtype="pmpid" link="fulltext">16391128</pubid>
                  <pubid idtype="doi">10.1128/AEM.72.1.860-868.2006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Community genomics among stratified microbial assemblages in the ocean's interior.</p>
            </title>
            <aug>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Preston</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Mincer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Hallam</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Frigaard</snm>
                  <fnm>NU</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Brito</snm>
                  <fnm>BR</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>311</volume>
            <fpage>496</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <