<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2010-11-1-202</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Minireview</dochead>
      <bibl>
         <title>
            <p>Assembling genomes using short-read sequencing technology</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Jackman</snm>
               <mi>D</mi>
               <fnm>Shaun</fnm>
               <insr iid="I1"/>
            </au>
            <au ca="yes" id="A2">
               <snm>Birol</snm>
               <fnm>&#304;nan&#231;</fnm>
               <insr iid="I1"/>
               <email>ibirol@bcgsc.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British, Columbia V5Z 4E6, Canada</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>1</issue>
         <fpage>202</fpage>
         <url>http://genomebiology.com/content/11/1/202</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">20128932</pubid><pubid idtype="doi">10.1186/gb-2010-11-1-202</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>BioMed Central Ltd</collab>
      </cpyrt>
      <shorttitle>
         <p>Assembling genomes using short-read sequencing technology</p>
      </shorttitle>
      <shortabs>
         <p>Short-read sequencing technology can bring gigabase genome assemblies in under a million dollars.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Gigabase-scale genome assemblies are now feasible using short-read sequencing technology, bringing the cost of such projects below the million-dollar mark.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
         <classification id="30010010" subtype="man_spc_id" type="BMC">Genome studies</classification>
         <classification id="30010013" subtype="man_spc_id" type="BMC">Methods</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p/>
         </st>
         <p>Moore's law is often used as a predictor in the informatics field for the growth of processing power based on the increase in the number of transistors in integrated circuits. It states that, according to the historical trend, this number doubles roughly every 2 years. A similar trend manifests itself in the number of base pairs deposited in the GenBank database, which had a mere 680,338 base pairs (bp) in its December 1982 release. Twenty-seven years later, that number reached 110,118,557,163 bp in its core repository, and 158,317,168,385 bp in the Whole Genome Shotgun sequencing project repository. This increase corresponds to a doubling roughly every 17 months over 3 decades. If this trend is sustained, by the mid-21st century we will have enough sequencing data to cover the genomes of the entire projected human population of 9 billion with more than fivefold redundancy, and have several exabases (10<sup>18 </sup>bp) remaining to sequence other species.</p>
         <p>This gap between the rates of growth of informatics and sequencing throughput is exerting a considerable strain on the development of bioinformatics tools to process the sequencing data generated. Hence, we need ever faster and more accurate algorithms to keep up with this increasing gap, much as media-specific compression algorithms such as those used by MP3 and DVD filled the gap between the digital media revolution and its storage requirements. This article focuses on three large and two smaller <it>de novo </it>sequencing projects, all published within the last 6 months, with a special emphasis on the recently published giant panda genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, which used a so-called next-generation sequencing (next-gen) platform from Illumina.</p>
         <p>Of the three major contenders in the next-gen sequencing field, the 454 platform from Roche generates the longest reads, and so its data are suited for <it>de novo </it>sequencing studies. However, it is also the most expensive per sequenced base to operate. The SOLiD platform from ABI sequences dinucleotides in color space rather than individual nucleotides. In color space representation, each of the 16 dinucleotides is assigned to one of four dyes. Each nucleotide is interrogated twice, which can improve accuracy, but the fact that each dye is shared by four dinucleotides complicates analysis. Hence, although less expensive to run, the SOLiD platform has mostly been used for resequencing studies. The Illumina platform is on a par with SOLiD in throughput and sequencing cost. However, it generates short-sequence data in nucleotide space and so is suitable for <it>de novo </it>sequencing. Although all three platforms were originally marketed for resequencing, with increasing read lengths, improving quality, and the development of protocols for paired-end reads, they are all now being used in <it>de novo </it>sequencing studies as well <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Recent <it>de novo </it>assemblies</p>
         </st>
         <p>Three genome projects recently published their results on the assembly and analysis of gigabase-scale genomes. For two of these, the B73 maize genome <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and the domestic horse genome <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, researchers took the more conventional approach of sequencing clones using capillary technology. In contrast, researchers on the third project - the panda genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> - exclusively used Illumina's short-read technology to sequence the complete genome.</p>
         <p>The B73 maize genome project followed the approach used by the original human genome project, using a physical map to select a minimum bacterial artificial chromosome (BAC) tiling path, and sequencing and assembling the selected clones to construct the <it>Zea mays </it>ssp. <it>mays </it>L. genome <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The high prevalence of repeat elements, constituting about 85% of the 10-chromosome, 2.3-gigabase genome, necessitated this rather conservative strategy. The project team assembled the 4&#215; to 6&#215; coverage data from capillary (Sanger) sequencing of a BAC library of 16,848 clones using Phrap <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, confirmed the assembly by BAC end sequencing, and refined it by sequencing 63 fosmid clones. The resulting assembly contains 125,325 contigs (61,161 scaffolds) with a contig (scaffold) N50 of 40 kb (76 kb), reconstructing 89% of the genome, with N50 denoting the weighted median; for a given assembly, half the genome is assembled in contigs larger than its N50. The estimated cost of the project, excluding the bioinformatics cost, is around US$30 million.</p>
         <p>The project team for the domestic horse genome reported the second version of the draft <it>Equus caballus </it>genome <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, which has 31 pairs of autosomes and one pair of sex chromosomes. Genome length is estimated to be between 2.5 and 2.7 Gb. Sampling the genome of a thoroughbred mare, three clone libraries were generated: 4-kb and 10-kb inserts, and 40-kb fosmids, yielding sequence fold-coverages of 4.96&#215;, 1.42&#215; and 0.40&#215;, respectively, on the capillary sequencing platform to a total of 6.8&#215; coverage. To improve the contiguity of the draft assembly, the team used end sequences of 314,972 BACs derived from a half-brother of the sequenced mare. The horse genome was assembled by Arachne 2.0 <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> to obtain a contig (scaffold) N50 of 112 kb (46 Mb), with about 46% of the assembled genome in repetitive sequences. The use of a whole-genome shotgun approach reduced the cost of this project to half that of the maize project.</p>
         <p>The above two projects used capillary sequencing data. In contrast, the giant panda genome project used Illumina sequencing data with an average read length of 52 bp and 73&#215; coverage to assemble the <it>Ailuropoda melanoleuca </it>genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, which, at an estimated 2.4-2.5 Gb, is of comparable length to the other two genomes. The assembly was performed in two stages using SOAPdenovo <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. In the first stage, the project team used paired-end sequencing data from 26 fragment libraries with nominal fragment sizes ranging from 110 bp to 570 bp. In the second stage, they used the pairing information from these libraries and from 11 long insert libraries of lengths 2 kb, 5 kb and 10 kb in successive iterations to scaffold the initial contigs. The resulting draft assembly is reported to have a contig (scaffold) N50 of 40 kb (1.3 Mb), reconstructing an estimated 92% of the genome. They also report that 36% of the panda genome is composed of transposable elements. The estimated cost of sequencing for this project is well under $1 million, making it 25 to 50 times more cost-efficient than the B73 maize and horse genome projects.</p>
         <p>The extensive use of the Illumina short-read technology, and the longer reads from the 454 machine, for the <it>de novo </it>assembly of shorter genomes have been reported at recent conferences, and those studies have started to be published <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Table <tblr tid="T1">1</tblr> compares the three genome projects described above and the recent genome assemblies of the filamentous fungus <it>Grosmannia clavigera </it>(blue-stain fungus) <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and of the bacterial pathogen <it>Pseudomonas syringae </it>pathovar <it>tabaci </it>11528 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, both of which used next-gen sequencing.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Assembly statistics for maize, horse, panda, blue-stain fungus (<it>G. clavigera</it>) and <it>P. syringae </it>genomes and their cost</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>B73 maize</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Domestic horse*</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>Giant panda<sup>&#8224;</sup></b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>G. clavigera</it>
                           <sup>&#8224;</sup>
                        </b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>
                           <it>P. syringae</it>
                           <sup>&#8224;</sup>
                        </b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Genome length</p>
                  </c>
                  <c ca="center">
                     <p>2.3 Gb</p>
                  </c>
                  <c ca="center">
                     <p>2.5-2.7 Gb</p>
                  </c>
                  <c ca="center">
                     <p>2.4-2.5 Gb</p>
                  </c>
                  <c ca="center">
                     <p>32.5 Mb</p>
                  </c>
                  <c ca="center">
                     <p>6.1 Mb</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Sequencing technology/ies</p>
                  </c>
                  <c ca="center">
                     <p>Sanger</p>
                  </c>
                  <c ca="center">
                     <p>Sanger</p>
                  </c>
                  <c ca="center">
                     <p>Illumina</p>
                  </c>
                  <c ca="center">
                     <p>Sanger, 454, Illumina</p>
                  </c>
                  <c ca="center">
                     <p>Illumina</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Number of contigs</p>
                  </c>
                  <c ca="center">
                     <p>125,325</p>
                  </c>
                  <c ca="center">
                     <p>55,316</p>
                  </c>
                  <c ca="center">
                     <p>198,274</p>
                  </c>
                  <c ca="center">
                     <p>3,361</p>
                  </c>
                  <c ca="center">
                     <p>1,346</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Contig N50</p>
                  </c>
                  <c ca="center">
                     <p>40 kb</p>
                  </c>
                  <c ca="center">
                     <p>112 kb</p>
                  </c>
                  <c ca="center">
                     <p>40 kb</p>
                  </c>
                  <c ca="center">
                     <p>32 kb</p>
                  </c>
                  <c ca="center">
                     <p>11 kb</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Number of scaffolds</p>
                  </c>
                  <c ca="center">
                     <p>61,161</p>
                  </c>
                  <c ca="center">
                     <p>9,687</p>
                  </c>
                  <c ca="center">
                     <p>81,469</p>
                  </c>
                  <c ca="center">
                     <p>2,322</p>
                  </c>
                  <c ca="center">
                     <p>71</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Scaffold N50</p>
                  </c>
                  <c ca="center">
                     <p>76 kb</p>
                  </c>
                  <c ca="center">
                     <p>46 Mb</p>
                  </c>
                  <c ca="center">
                     <p>1.3 Mb</p>
                  </c>
                  <c ca="center">
                     <p>782 kb</p>
                  </c>
                  <c ca="center">
                     <p>317 kb</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Estimated sequencing cost</p>
                  </c>
                  <c ca="center">
                     <p>$30 million</p>
                  </c>
                  <c ca="center">
                     <p>$15 million</p>
                  </c>
                  <c ca="center">
                     <p>$0.6 million</p>
                  </c>
                  <c ca="center">
                     <p>$100,000</p>
                  </c>
                  <c ca="center">
                     <p>$4,000</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Contiguity statistics are calculated for *contigs and scaffolds 1 kb or longer and <sup>&#8224;</sup>contigs and scaffolds 100 bp or longer.</p>
            </tblfn>
         </tbl>
         <p>Arguably, even if state-of-the-art sequencing protocols and bioinformatics tools are used, genomes with high repeat content, such as B73 maize, may still not yield to short-read sequencing. However, if the success and the quality of the paradigm used by the giant panda genome project team is validated and reproduced, new <it>de novo </it>sequencing projects for complex genomes will benefit from the reduction in cost as well as the time efficiencies offered by the short-read technologies.</p>
      </sec>
      <sec>
         <st>
            <p>Assembly tools</p>
         </st>
         <p>The enabling paradigm behind the <it>de novo </it>assembly of the giant panda genome is based on a de Bruijn graph representation of short sequence overlaps. A de Bruijn graph is a directed graph where vertices are strings of length k and edges represent overlaps of k-1 symbols, or nucleotides in the case of genome sequences. This approach was introduced to the field by Pevzner and coworkers with the Euler software <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, and was made popular by the software Velvet <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The first application of the technology for mammalian-sized genomes was demonstrated by Simpson <it>et al</it>. using ABySS <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>These tools produce first-pass draft assemblies using a de Bruijn graph, followed by contig merging using pairedend information. For the latter stage, several groups have developed alternative ways of using the information in the read pairs. The ALLPATHS algorithm <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> uses the pairedend information in layers, starting with the large-fragment libraries to build 20 kb regions, called neighborhoods, around unique contigs, called seeds. The short-fragment pairs are then used to assemble the neighborhood, including the repetitive regions between the seeds. The panda assembly <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> also used a similar layered approach to using fragment libraries, but started with the shorter-fragment libraries and proceeded to the longer-fragment libraries.</p>
         <p>The authors of Velvet suggest in a subsequent paper <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> that shorter-fragment libraries may be unnecessary. They argue that distance between two nearby contigs can be calculated by comparing their distances, estimated using a large-fragment library, to a third more distant contig. The distance between the two nearby contigs is logically the difference between their distances to the distant contig.</p>
         <p>In ABySS <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, multiple libraries of different sized fragments are considered simultaneously. Distances between pairs of contigs are estimated using each fragment library on its own, and the most accurate distance estimates between contig pairs, which typically come from the library with the smallest fragments that span each distance, are retained. After smaller contigs have been merged into larger contigs, cases that could not be resolved in previous iterations are then reconsidered.</p>
         <p>Producing the best possible de Bruijn graph assembly requires optimizing the fundamental parameter of k-mer size, which determines the length of significant overlaps for contig growth. Li <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> report obtaining a singleend contig N50 of 1,483 bp using k = 27 with SOAPdenovo <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Reassembling their cleaned sequence data using ABySS 1.1.0 <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> without paired-end information, we obtained a contig N50 of 1,381 bp using k = 27, and an improved N50 of 1,952 using k = 35 (see Table <tblr tid="T2">2</tblr>). This shows that although the contiguity of the final panda assembly is already adequate for a genome of this size, it might be improved further by using a larger k-mer size.</p>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Effect of the choice of k-mer size on the single-end contig N50 for the giant panda assembly using ABySS 1.1.0</p>
            </caption>
            <tblbdy cols="9">
               <r>
                  <c ca="left">
                     <p>
                        <b>k-mer size</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>27*</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>30</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>32</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>34</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>35</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>36</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>37</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <b>38</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="9">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Contig N50 (bp)</p>
                  </c>
                  <c ca="center">
                     <p>1,381</p>
                  </c>
                  <c ca="center">
                     <p>1,724</p>
                  </c>
                  <c ca="center">
                     <p>1,863</p>
                  </c>
                  <c ca="center">
                     <p>1,940</p>
                  </c>
                  <c ca="center">
                     <p>1,952</p>
                  </c>
                  <c ca="center">
                     <p>1,942</p>
                  </c>
                  <c ca="center">
                     <p>1,924</p>
                  </c>
                  <c ca="center">
                     <p>1,860</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>*The k-mer size used for the reported giant panda genome assembly <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
            </tblfn>
         </tbl>
         <p>The five genomes noted in this article have different levels of completeness, and the cost estimates we report are based on a number of assumptions and on the summary numbers reported in the respective studies. Furthermore, they exclude any costs related to the bioinformatics activities. As such, the sequencing costs are not directly comparable. Nevertheless, at face value, a pattern emerges that favors the short-read technology. This is not news, certainly, as it is the underlying premise of the next-gen platforms, yet the short-read assembly studies cited show that bioinformatics is catching up with the pace of data generation by these platforms. Thus, with software tools maturing and experimental protocols being refined, the number of genomes assembled with short reads will increase, and their size will expand.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The sequence and <it>de novo </it>assembly of the giant panda genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Jian</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Xuan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>OA</fnm>
               </au>
               <au>
                  <snm>Leung</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2009</pubdate>
            <volume>463</volume>
            <fpage>311</fpage>
            <lpage>317</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature08696</pubid>
                  <pubid idtype="pmpid" link="fulltext">20010809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Diguistini</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liao</snm>
                  <fnm>NY</fnm>
               </au>
               <au>
                  <snm>Platt</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Seidel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Docking</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Birol</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Hirst</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mardis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hamelin</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Bohlmann</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Breuil</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>R94</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2768983</pubid>
                  <pubid idtype="pmpid" link="fulltext">19747388</pubid>
                  <pubid idtype="doi">10.1186/gb-2009-10-9-r94</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The B73 maize genome: complexity, diversity, and dynamics.</p>
            </title>
            <aug>
               <au>
                  <snm>Schnable</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Ware</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pasternak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fulton</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Graves</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Minx</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reily</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Courtney</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kruchowski</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Tomlinson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Strong</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Delehaunty</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fronick</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Courtney</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rock</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Belter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Abbott</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marchetto</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ochoa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Gillam</snm>
                  <fnm>B</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2009</pubdate>
            <volume>326</volume>
            <fpage>1112</fpage>
            <lpage>1115</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1178534</pubid>
                  <pubid idtype="pmpid" link="fulltext">19965430</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Genome sequence, comparative analysis, and population genetics of the domestic horse.</p>
            </title>
            <aug>
               <au>
                  <snm>Wade</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Giulotto</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sigurdsson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zoli</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gnerre</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Imsland</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Lear</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Adelson</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bellone</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Bl&#246;cker</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Distl</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Garber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Leeb</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mauceli</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>MacLeod</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Penedo</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Raison</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Sharpe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Andersson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Antczak</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Biagi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Chowdhary</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Coleman</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Della Valle</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fryc</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gu&#233;rin</snm>
                  <fnm>G</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2009</pubdate>
            <volume>326</volume>
            <fpage>865</fpage>
            <lpage>867</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1178158</pubid>
                  <pubid idtype="pmpid" link="fulltext">19892987</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Phrap</p>
            </title>
            <url>http://www.phrap.org/phredphrap/phrap.html</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>ARACHNE: a whole-genome shotgun assembler.</p>
            </title>
            <aug>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Stanley</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gnerre</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mauceli</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Berger</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>177</fpage>
            <lpage>189</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155255</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779843</pubid>
                  <pubid idtype="doi">10.1101/gr.208902</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>SOAP: short oligonucleotide analysis package</p>
            </title>
            <url>http://soap.genomics.org.cn/soapdenovo.html</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of <it>Pseudomonas syringae </it>pathovar <it>tabaci </it>11528.</p>
            </title>
            <aug>
               <au>
                  <snm>Studholme</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Ibanez</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>MacLean</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dangl</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Rathjen</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>395</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2745422</pubid>
                  <pubid idtype="pmpid" link="fulltext">19703286</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-10-395</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley.</p>
            </title>
            <aug>
               <au>
                  <snm>Steuernagel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Taudien</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gundlach</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Seidel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ariyadasa</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schulte</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Petzold</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Felder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Graner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Scholz</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Platzer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>547</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2784808</pubid>
                  <pubid idtype="pmpid" link="fulltext">19930547</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-10-547</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>De novo assembly of the <it>Pseudomonas syringae </it>pv. <it>syringae </it>B728a genome using Illumina/Solexa short sequence reads.</p>
            </title>
            <aug>
               <au>
                  <snm>Farrer</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kemen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Studholme</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>2009</pubdate>
            <volume>291</volume>
            <fpage>103</fpage>
            <lpage>111</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1574-6968.2008.01441.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">19077061</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>An Eulerian path approach to DNA fragment assembly.</p>
            </title>
            <aug>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>9748</fpage>
            <lpage>9753</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55524</pubid>
                  <pubid idtype="pmpid" link="fulltext">11504945</pubid>
                  <pubid idtype="doi">10.1073/pnas.171285098</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Velvet: algorithms for de novo short read assembly using de Bruijn graphs.</p>
            </title>
            <aug>
               <au>
                  <snm>Zerbino</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <fpage>821</fpage>
            <lpage>829</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2336801</pubid>
                  <pubid idtype="pmpid" link="fulltext">18349386</pubid>
                  <pubid idtype="doi">10.1101/gr.074492.107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>ABySS: a parallel assembler for short read sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Simpson</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jackman</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Schein</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Birol</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2009</pubdate>
            <volume>19</volume>
            <fpage>1117</fpage>
            <lpage>1123</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2694472</pubid>
                  <pubid idtype="pmpid" link="fulltext">19251739</pubid>
                  <pubid idtype="doi">10.1101/gr.089532.108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>ALLPATHS: de novo assembly of whole-genome shotgun microreads.</p>
            </title>
            <aug>
               <au>
                  <snm>Butler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>MacCallum</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kleber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shlyakhter</snm>
                  <fnm>IA</fnm>
               </au>
               <au>
                  <snm>Belmonte</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Nusbaum</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <fpage>810</fpage>
            <lpage>820</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2336810</pubid>
                  <pubid idtype="pmpid" link="fulltext">18340039</pubid>
                  <pubid idtype="doi">10.1101/gr.7337908</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Pebble and rock band: heuristic resolution of repeats and scaffolding in the Velvet short-read <it>de novo </it>assembler.</p>
            </title>
            <aug>
               <au>
                  <snm>Zerbino</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>McEwen</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Margulies</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2009</pubdate>
            <volume>4</volume>
            <fpage>e8407</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2793427</pubid>
                  <pubid idtype="pmpid" link="fulltext">20027311</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0008407</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>