<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>gb-2010-11-5-r52</ui><ji>GBJ</ji><fm>
<dochead>Research</dochead>
<bibl>
<title>
<p>Towards a comprehensive structural variation map of an individual human genome</p>
</title>
<aug>
<au id="A1"><snm>Pang</snm><mi>W</mi><fnm>Andy</fnm><insr iid="I1"/><insr iid="I2"/><email>andypang@sickkids.ca</email></au>
<au id="A2"><snm>MacDonald</snm><mi>R</mi><fnm>Jeffrey</fnm><insr iid="I2"/><email>jmacdonald@sickkids.ca</email></au>
<au id="A3"><snm>Pinto</snm><fnm>Dalila</fnm><insr iid="I2"/><email>dcpinto@sickkids.ca</email></au>
<au id="A4"><snm>Wei</snm><fnm>John</fnm><insr iid="I2"/><email>wei@sickkids.ca</email></au>
<au id="A5"><snm>Rafiq</snm><mi>A</mi><fnm>Muhammad</fnm><insr iid="I2"/><email>arshad115@yahoo.com</email></au>
<au id="A6"><snm>Conrad</snm><mi>F</mi><fnm>Donald</fnm><insr iid="I3"/><email>dc4@sanger.ac.uk</email></au>
<au id="A7"><snm>Park</snm><fnm>Hansoo</fnm><insr iid="I4"/><email>hspark27@naver.com</email></au>
<au id="A8"><snm>Hurles</snm><mi>E</mi><fnm>Matthew</fnm><insr iid="I3"/><email>meh@sanger.ac.uk</email></au>
<au id="A9"><snm>Lee</snm><fnm>Charles</fnm><insr iid="I4"/><email>clee@rics.bwh.harvard.edu</email></au>
<au id="A10"><snm>Venter</snm><fnm>J Craig</fnm><insr iid="I5"/><email>jcventer@venterinstitute.org</email></au>
<au id="A11"><snm>Kirkness</snm><mi>F</mi><fnm>Ewen</fnm><insr iid="I5"/><email>ekirknes@jcvi.org</email></au>
<au id="A12"><snm>Levy</snm><fnm>Samuel</fnm><insr iid="I5"/><email>slevy@jcvi.org</email></au>
<au ca="yes" id="A13" ce="yes"><snm>Feuk</snm><fnm>Lars</fnm><insr iid="I2"/><insr iid="I6"/><email>lars.feuk@genpat.uu.se</email></au>
<au ca="yes" id="A14" ce="yes"><snm>Scherer</snm><mi>W</mi><fnm>Stephen</fnm><insr iid="I1"/><insr iid="I2"/><email>stephen.scherer@sickkids.ca</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada</p></ins>
<ins id="I2"><p>The Centre for Applied Genomics, The Hospital for Sick Children, 101 College Street, Toronto, Ontario M5G 1L7, Canada</p></ins>
<ins id="I3"><p>Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK</p></ins>
<ins id="I4"><p>Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 221 Longwood Avenue, Boston, Massachusetts 02115, USA</p></ins>
<ins id="I5"><p>J Craig Venter Institute, 9740 Medical Center Drive, Rockville, Maryland 20850, USA</p></ins>
<ins id="I6"><p>Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala 75185, Sweden</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2010</pubdate>
<volume>11</volume>
<issue>5</issue>
<fpage>R52</fpage>
<url>http://genomebiology.com/2010/11/5/R52</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-5-r52</pubid><pubid idtype="pmpid">20482838</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>20</day><month>2</month><year>2010</year></date></rec><revrec><date><day>11</day><month>4</month><year>2010</year></date></revrec><acc><date><day>19</day><month>5</month><year>2010</year></date></acc><pub><date><day>19</day><month>5</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Pang et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<shorttitle>
<p>Human structural variation</p>
</shorttitle>
<shortabs>
<p>A comprehensive map of structural variation in the human genome provides a reference dataset for analyses of future personal genomes.</p>
</shortabs>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (&lt;10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and &gt;24% of structural variants would not be imputed by SNP-association.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.</p>
</sec>
</sec>
</abs>
</fm><meta>
<classifications>
<classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
<classification id="300100010" subtype="man_spc_id" type="BMC">Genome studies</classification>
</classifications>
</meta><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Comprehensive catalogues of genetic variation are crucial for genotype and phenotype correlation studies <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp>, in particular when rare or multiple genetic variants underlie traits or disease susceptibility <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. Since 2007, several personal genomes have been sequenced, capturing different extents of their genetic variation content (Additional file <supplr sid="S1">1</supplr>) <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>. In the first publication (J Craig Venter's DNA named HuRef) <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, variants were identified based on a comparison of the Venter assembly to the National Center for Biotechnology Information (NCBI) reference genome (build 36). In total, 3,213,401 SNPs and 796,167 structural variants (SVs; here SV encompasses all non-SNP variation) were identified in that study. Similar numbers of SNPs, but significantly less SVs (ranging from approximately 137,000 to approximately 400,000) are reported in other individual genome sequencing projects <abbrgrp>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
<abbr bid="B11">11</abbr>
</abbrgrp>. It is clear that even with deep sequence coverage, annotation of structural variation remains very challenging, and the full extent of SV in the human genome is still unknown.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Genetic variation in sequenced genomes</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S1.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>Microarrays <abbrgrp>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp> and sequencing <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp> have revealed that SV contributes significantly to the complement of human variation, often having unique population <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> and disease <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp> characteristics. Despite this, there is limited overlap in independent studies of the same DNA source <abbrgrp>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
</abbrgrp>, indicating that each platform detects only a fraction of the existing variation, and that many SVs remain to be found. In a recent study using high-resolution comparative genomic hybridization arrays, the authors found that approximately 0.7% of the genome was variable in copy number in each hybridization of two samples <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>. Yet, these experiments were limited to the detection of unbalanced variation larger than 500 bp, and the total amount of variation between two genomes would therefore be expected to exceed 0.7%.</p>
<p>Our objective in the present study was to annotate the full spectrum of genetic variation in a single genome. We used the previously sequenced Venter genome due to the availability of DNA and full access to genome sequence data. The assembly comparison method presented in the initial sequencing of this genome <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> discovered an unprecedented number of SVs in a single genome; however, the approach relied on an adequate diploid assembly. As there are known limitations in assembling alternative alleles for SV <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, we expected that there was still a significant amount of variation to be found. In an attempt to capture the full spectrum of variation in a human genome, this current study uses multiple sequencing- and microarray-based strategies to complement the results of the assembly comparison approach in the Levy <it>et al. </it>
<abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> study. First, we detect genetic variation from the original Sanger sequence reads by direct alignment to NCBI build 36 assembly, bypassing the assembly step. Furthermore, using custom high density microarrays, we probe the Venter genome to identify variants in regions where sequencing-based approaches may have difficulties (Figure <figr fid="F1">1</figr>). We discover thousands of new SVs, but also find biases in each method's ability to detect variants. Our collective data reveal a continuous size distribution of genetic variants (Figure <figr fid="F2">2a</figr>) with approximately 1.58% of the Venter haploid genome encompassed by SVs (39,520,431 bp or 1.28% as unbalanced SVs and 9,257,035 bp or 0.30% as inversions) and 0.1% as SNPs (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F2">2</figr>). While there is still room for improvement, our results give the best estimate to date of the variation content in a human genome, provide an important resource of SVs for other personal genome studies, and highlight the importance of using multiple strategies for SV discovery.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Overall workflow of the current study</p></caption><text>
   <p><b>Overall workflow of the current study</b>. Two distinct technologies were used to identify SV in the Venter genome: whole genome sequencing and genomic microarrays. The sequencing experiments, the construction of the Venter genome assembly, and the assembly comparison with NCBI build 36 (B36) reference had been completed in previous studies <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B16">16</abbr><abbr bid="B39">39</abbr></abbrgrp>. Hence, these experiments are shown as blue boxes. The scope of the current study is denoted in orange boxes. We re-analyzed the initial sequencing data, and searched for SVs in sequence alignments by the mate-pair and split-read approaches. We also used three distinct comparative genomic hybridization (CGH) array platforms: Agilent 24 M, NimbleGen 42 M and Agilent 244 K. Unlike the other array platforms, which were designed based on the B36 assembly, the Agilent 244 K targeted scaffold segments unique to the Celera/Venter assembly. To denote this, Figure 1 shows a dotted line connecting between the assembly comparison outcome and the Agilent 244 K box. Finally, the Affymetrix 6.0 and Illumina 1 M SNP arrays were also used in the present study.</p>
</text><graphic file="gb-2010-11-5-r52-1" hint_layout="double"/></fig>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Size distribution of genetic variants</p></caption><text>
   <p><b>Size distribution of genetic variants</b>. <b>(a) </b>A non-redundant size spectrum of SNP and CNV (including indels) and a breakdown of the proportion of gain to loss. The indel/CNV dataset consists of variants detected by assembly comparison, mate-pair, split-read, NimbleGen 42 M comparative genomic hybridization (CGH) and Agilent 24 M. The results show that the number and the size of variants are negatively correlated. Although the proportions of gains and losses are quite equal across the size spectrum, there are some deviations. Losses are more abundant in the 1 to 10 kb range, and this is mainly due to the inability of the 2-kb and 10-kb library mate-pair clones to detect insertions larger than their clone size. The opposite is seen for large events, where duplications are more common than deletions, which may be due to both biological and methodological biases. The increase in the number of events near 300 bp and 6 kb can be explained by short interspersed nuclear element (SINE) and long interspersed nuclear element (LINE) indels, respectively. The general peak around 10 kb corresponds to the interval with the highest clone coverage. <b>(b) </b>Size distribution of gains (insertions and duplications) highlighting the detection range of each methodology. The split-read method is designed to capture insertions from 11 bp to the size of a Sanger-based sequence read (approximately 1 kb). There is no insertion detected in the size range between the 2 kb and 10 kb library using the mate-pair approach. Furthermore, due to technical limitations, large gains (&#8805; 100,000 bp) cannot be identified with the sequencing-based approaches, while these are readily identified by microarrays. <b>(c) </b>Size distribution of deletions.</p>
</text><graphic file="gb-2010-11-5-r52-2" hint_layout="double"/></fig>
<tbl hint_layout="double" id="T1"><title><p>Table 1</p></title><caption><p>Structural variants detected by different methods</p></caption><tblbdy cols="7">
      <r>
         <c ca="left">
            <p>
               <b>Method</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Type</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Number</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Minimum size (bp)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Median size (bp)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Maximum size (bp)</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total size (bp)</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>Assembly comparison</it>
               <sup>a</sup>
            </p>
         </c>
         <c ca="left">
            <p>
               <it>Homo. insertion</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>275,512</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>2</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>82,711</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>3,117,039</it>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>Homo. deletion</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>283,961</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>2</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>18,484</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>2,820,823</it>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>Hetero. insertion</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>136,792</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>321</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>336,374</it>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>Hetero. deletion</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>99,814</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>349</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>250,300</it>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <it>Inversion</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>88</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>102</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1,602</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>686,721</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>1,627,871</it>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mate-pair</p>
         </c>
         <c ca="left">
            <p>Insertion</p>
         </c>
         <c ca="center">
            <p>780</p>
         </c>
         <c ca="center">
            <p>346</p>
         </c>
         <c ca="center">
            <p>3,588</p>
         </c>
         <c ca="center">
            <p>28,344</p>
         </c>
         <c ca="center">
            <p>3,880,544</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>1,494</p>
         </c>
         <c ca="center">
            <p>340</p>
         </c>
         <c ca="center">
            <p>3,611</p>
         </c>
         <c ca="center">
            <p>1,669,696</p>
         </c>
         <c ca="center">
            <p>10,531,345</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Inversion</p>
         </c>
         <c ca="center">
            <p>105</p>
         </c>
         <c ca="center">
            <p>368</p>
         </c>
         <c ca="center">
            <p>3,121</p>
         </c>
         <c ca="center">
            <p>2,026,495</p>
         </c>
         <c ca="center">
            <p>8,068,541</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Split-read</p>
         </c>
         <c ca="left">
            <p>Insertion</p>
         </c>
         <c ca="center">
            <p>8,511</p>
         </c>
         <c ca="center">
            <p>11</p>
         </c>
         <c ca="center">
            <p>16</p>
         </c>
         <c ca="center">
            <p>414</p>
         </c>
         <c ca="center">
            <p>224,022</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>11,659</p>
         </c>
         <c ca="center">
            <p>11</p>
         </c>
         <c ca="center">
            <p>18</p>
         </c>
         <c ca="center">
            <p>111,714</p>
         </c>
         <c ca="center">
            <p>1,764,522</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Agilent 24 M</p>
         </c>
         <c ca="left">
            <p>Duplication</p>
         </c>
         <c ca="center">
            <p>194</p>
         </c>
         <c ca="center">
            <p>445</p>
         </c>
         <c ca="center">
            <p>1,274</p>
         </c>
         <c ca="center">
            <p>113,465</p>
         </c>
         <c ca="center">
            <p>1,065,617</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>319</p>
         </c>
         <c ca="center">
            <p>439</p>
         </c>
         <c ca="center">
            <p>1,198</p>
         </c>
         <c ca="center">
            <p>852,404</p>
         </c>
         <c ca="center">
            <p>2,779,880</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NimbleGen 42 M</p>
         </c>
         <c ca="left">
            <p>Duplication</p>
         </c>
         <c ca="center">
            <p>366</p>
         </c>
         <c ca="center">
            <p>448</p>
         </c>
         <c ca="center">
            <p>4,665</p>
         </c>
         <c ca="center">
            <p>836,362</p>
         </c>
         <c ca="center">
            <p>11,292,451</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>358</p>
         </c>
         <c ca="center">
            <p>459</p>
         </c>
         <c ca="center">
            <p>2,460</p>
         </c>
         <c ca="center">
            <p>359,736</p>
         </c>
         <c ca="center">
            <p>3,861,282</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Affymetrix 6.0</p>
         </c>
         <c ca="left">
            <p>Duplication</p>
         </c>
         <c ca="center">
            <p>17</p>
         </c>
         <c ca="center">
            <p>8,638</p>
         </c>
         <c ca="center">
            <p>42,798</p>
         </c>
         <c ca="center">
            <p>640,474</p>
         </c>
         <c ca="center">
            <p>2,011,557</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>21</p>
         </c>
         <c ca="center">
            <p>2,280</p>
         </c>
         <c ca="center">
            <p>13,145</p>
         </c>
         <c ca="center">
            <p>856,671</p>
         </c>
         <c ca="center">
            <p>1,978,028</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Illumina 1 M</p>
         </c>
         <c ca="left">
            <p>Duplication</p>
         </c>
         <c ca="center">
            <p>3</p>
         </c>
         <c ca="center">
            <p>11,539</p>
         </c>
         <c ca="center">
            <p>22,148</p>
         </c>
         <c ca="center">
            <p>87,670</p>
         </c>
         <c ca="center">
            <p>121,357</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>9</p>
         </c>
         <c ca="center">
            <p>8,576</p>
         </c>
         <c ca="center">
            <p>32,199</p>
         </c>
         <c ca="center">
            <p>145,662</p>
         </c>
         <c ca="center">
            <p>431,131</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Custom Agilent 244 k</p>
         </c>
         <c ca="left">
            <p>Duplication</p>
         </c>
         <c ca="center">
            <p>44</p>
         </c>
         <c ca="center">
            <p>219</p>
         </c>
         <c ca="center">
            <p>1,356</p>
         </c>
         <c ca="center">
            <p>8,737</p>
         </c>
         <c ca="center">
            <p>98,529</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Deletion</p>
         </c>
         <c ca="center">
            <p>7</p>
         </c>
         <c ca="center">
            <p>170</p>
         </c>
         <c ca="center">
            <p>332</p>
         </c>
         <c ca="center">
            <p>2,258</p>
         </c>
         <c ca="center">
            <p>4,130</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Non-redundant total<sup>b</sup></b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Insertion/duplication</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>417,206</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>836,362</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>19,981,062</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>Deletion</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>390,973</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>2</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>1,669,696</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>19,539,369</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>Inversion</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>167</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>102</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>1,249</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>2,026,495</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>9,257,035</b>
            </p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p><sup>a</sup>We used an italicized font to distinguish the results from the Levy <it>et al. </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp> study. Moreover, from that previous study, we included all homozygous indels, heterozygous indels, indels embedded within simple, bi-allelic, and non-ambiguously mapped heterozygous mixed sequence variants, and only those inversions whose size is at most 3 Mb. <sup>b</sup>Complete data are presented in Additional files <supplr sid="S19">19</supplr>, <supplr sid="S20">20</supplr> and <supplr sid="S21">21</supplr>. Non-redundant variation size distribution is presented in Figure 2a.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>Several different analytical and experimental strategies were employed to exhaustively analyze the Venter genome for SV. An overview of the different analyses performed is shown in Figure <figr fid="F1">1</figr>.</p>
<sec>
<st>
<p>Sequencing-based variation</p>
</st>
<p>We first used computational strategies to extract additional SV information from the existing Sanger-based sequencing data generated as paired-end (or mate-pair) reads from clone libraries of defined size <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. First, we adopted a paired-end mapping approach <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp> and aligned 11,346,790 mate-pairs from libraries with expected clone sizes of 2, 10 or 37 kb (Additional file <supplr sid="S2">2</supplr>) to the NCBI build 36 assembly. We found that 97.3% of mate-pairs had the expected mapping distance and orientation. Mate-pairs discordant in orientation or mapping distance were used to identify variants, and we required each event to be supported by at least two clones. In total, this strategy was used to identify 780 insertions, 1,494 deletions and 105 inversions (Figure <figr fid="F1">1</figr>; Table <tblr tid="T1">1</tblr>; Additional file <supplr sid="S3">3</supplr>). In an independent analysis of the same underlying sequencing data, we then captured SVs by examining the alignment profiles of 31,546,016 paired and unpaired reads to search for intra-alignment gaps <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>. The presence of an intra-alignment gap in the sequence read (query sequence) or in the reference genome (target sequence) would indicate a putative insertion or deletion event, respectively. The identification of such a 'split-read' alignment signature complements the mate-pair approach, as significantly smaller insertions and deletions can be discovered. We required at least two overlapping split-reads having an alignment gap &gt;10 bp to call a variant. A total of 8,511 insertions and 11,659 deletions ranging from 11 to 111,714 bp in size were identified (Figure <figr fid="F1">1</figr>; Table <tblr tid="T1">1</tblr>; Additional file <supplr sid="S4">4</supplr>).</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Clone library information</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S2.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>Mate-pair variants and comparison with various data sets</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S3.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>Split-read variants and comparison with various data sets</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S4.XLS">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Array based variation</p>
</st>
<p>We used two ultra-high density custom comparative genomic hybridization (CGH) array sets and two commonly used SNP genotyping arrays to identify relative gains and losses. A significant amount of variation was detected from the two custom CGH arrays: an Agilent oligonucleotide array set with 24 million features (Agilent 24 M) <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, and a NimbleGen oligonucleotide array set containing 42 million features (NimbleGen 42 M) <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>. The Agilent platform identified 194 duplications and 319 deletions, while the NimbleGen array set detected 366 gains and 358 losses, ranging in size from 439 bp to 852 kb, in Venter (Figure <figr fid="F1">1</figr>; Table <tblr tid="T1">1</tblr>; Additional files <supplr sid="S5">5</supplr> and <supplr sid="S6">6</supplr>). Furthermore, we scanned the Venter genome using Affymetrix SNP Array 6.0 and Illumina BeadChip 1 M, and the results are summarized in Table <tblr tid="T1">1</tblr> plus Additional files <supplr sid="S7">7</supplr> and <supplr sid="S8">8</supplr>.</p>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>Agilent 24 M variants and comparison with various data sets</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S5.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S6">
<title>
<p>Additional file 6</p>
</title>
<text>
<p>
<b>NimbleGen 42 M variants and comparison with various data sets</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S6.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S7">
<title>
<p>Additional file 7</p>
</title>
<text>
<p>
<b>Affymetrix 6.0 variants and comparison with various data sets</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S7.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S8">
<title>
<p>Additional file 8</p>
</title>
<text>
<p>
<b>Illumina 1 M variants and comparison with various data sets</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S8.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>Most microarrays used for CNV analyses are designed based on the NCBI assemblies. Therefore, any region where the reference exhibits the deletion allele of an indel, or sequences mapping to gaps in the assembly, will not be targeted. In previous studies <abbrgrp>
<abbr bid="B16">16</abbr>
<abbr bid="B24">24</abbr>
</abbrgrp>, many unknown DNA segments were identified to have no or poor alignment to the NCBI reference when compared to the Celera R27C assembly. To capture genetic variation in such potentially novel sequences, we designed a custom Agilent 244 K array to target those scaffold sequences at least 500 bp in length. We then performed CGH on seven HapMap individuals and detected 231 regions (101 gains and 130 losses) in 161 scaffolds to be variable (Additional file <supplr sid="S9">9</supplr>). Of these, we found 44 gains and 7 losses in 36 Celera scaffolds were specific to Venter (Figure <figr fid="F1">1</figr>, Table <tblr tid="T1">1</tblr>). Using paired-end mapping, as well as cross-species genome comparison with the chimpanzee, we were able to find a placement in NCBI build 36 for 25 of 36 scaffolds that were copy number variable in Venter. Two of the scaffolds were mapped to regions containing assembly gaps, 15 of 25 anchored scaffolds corresponded to insertion events also detected elsewhere <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>, and the remaining eight represent new insertion findings (Additional file <supplr sid="S10">10</supplr>).</p>
<suppl id="S9">
<title>
<p>Additional file 9</p>
</title>
<text>
<p>
<b>Custom Agilent 244 K copy number variants</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S9.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S10">
<title>
<p>Additional file 10</p>
</title>
<text>
<p>
<b>Custom Agilent 244 K copy number variable-scaffolds anchoring information</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S10.XLS">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Validation of findings</p>
</st>
<p>We used several computational and experimental approaches to validate our SV findings. We performed experimental validation by PCR amplification and gel-sizing and confirmed 89 of the 96 (93%) SVs predicted by sequence analysis (Additional files <supplr sid="S11">11</supplr> and <supplr sid="S12">12</supplr>). Using quantitative real-time PCR (qPCR), we validated 20 of 25 (80.0000%) CNVs detected by microarrays, and most of these CNVs were from the custom Agilent 244 K array covering sequences not in the NCBI assembly (Additional file <supplr sid="S13">13</supplr>). Inversion predictions were tested by fluorescence <it>in situ </it>hybridization (FISH) <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. In one such finding, a predicted 1.1-Mb inversion at 16p12 was identified to be homozygous in Venter and in all of the seven additional HapMap samples from four populations tested, suggesting that the reference at this locus represents a rare allele, or is incorrectly assembled (Additional file <supplr sid="S14">14</supplr>).</p>
<suppl id="S11">
<title>
<p>Additional file 11</p>
</title>
<text>
<p>
<b>Example of a PCR-validated insertion event with size 84 bp predicted by the split-read approach</b>. A pair of primers, separated by 497 bp was designed surrounding the insertion site. PCR was run with these primers, and the presence of the insertion was resolved by gel electrophoresis. Starting from the right, DNA from five European controls, DNA from Venter and a negative control were added in lanes 1 to 5, lane 6 and lane 7, respectively.</p>
</text>
<file name="gb-2010-11-5-r52-S11.TIFF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S12">
<title>
<p>Additional file 12</p>
</title>
<text>
<p>
<b>List of validated variants and their primers and probes</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S12.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S13">
<title>
<p>Additional file 13</p>
</title>
<text>
<p>
<b>Example of a qPCR-validated gain in Venter relative to sample NA10851 as detected by the custom Agilent 244 K aCGH</b>. A 4.2-kb CNV was detected on the Celera scaffold GA_x5YUVVTY6, and by qPCR, we found that NA10851 had a heterozygous loss in that region, thus confirming a relative gain in Venter.</p>
</text>
<file name="gb-2010-11-5-r52-S13.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S14">
<title>
<p>Additional file 14</p>
</title>
<text>
<p>
<b>A common inversion on 16p12.2 validated by FISH</b>. <b>(a) </b>A 2-Mb website schematic of the region. This 1.1-Mb inversion was detected by the mate-pair method in Venter as seen in track 'B_Clone'. The track 'Inversions' shows that this inversion was annotated in three other studies <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>. <b>(b) </b>An image of a four-color FISH experiment revealing that Venter is homozygous for the 16p12.2 inverted allele. Four differentially labeled fosmid probes were scored in &gt;100 interphase FISH experiments and the order of the probes in Venter were found in the vast majority of experiments (including in seven HapMap controls from four different populations) to be in the yellow-green-blue-pink order. In the absence of the inversion, the order of the probes would be yellow-blue-green-pink as depicted in the assembly schematic. Therefore, as discussed in the main text our data suggest that the NCBI build 36 reference represents a rare allele, or may be incorrect.</p>
</text>
<file name="gb-2010-11-5-r52-S14.TIFF">
   <p>Click here for file</p>
</file>
</suppl>
<p>We then compared the SVs identified here with the previous assembly comparison-based analysis of the same genome <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, and found that 11,140 variants were in common. We noticed that our multi-platform method excelled in calling large variants. In fact, even after excluding all of the small variants (&#8804; 10 bp) from the previous Levy <it>at al</it>. study <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, we still observed that the current study tended to find larger SVs (a current average of 1,909.3 bp now versus a previous average of 113.4 bp). Additional file <supplr sid="S15">15</supplr> shows that the sensitivity of assembly comparison dropped as size increased to over 1 kb, and the proportion of larger SVs significantly increased as a result of the present study (Figure <figr fid="F2">2b, c</figr>).</p>
<suppl id="S15">
<title>
<p>Additional file 15</p>
</title>
<text>
<p>
<b>Comparative analysis of variants discovered in Levy <it>et al. </it>
</b>
<abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>
<b> and the current study</b>. The two graphs illustrate the proportion of SVs identified by the assembly comparison method, by our present combined multi-approach strategy (including mate-pair, split-read, CGH arrays and SNP arrays), and the proportion confirmed by both. The x-axis represents size range, while the numbers at the top indicate the total number of calls in a particular size range. As size increases, the number of variants called by assembly comparison decreases significantly, so this indicates that the method has limited sensitivity in detecting large calls. In contrast, our combined multi-approach strategy in the current study is more suitable in finding large variation. <b>(a) </b>Size distribution of gains. <b>(b) </b>Size distribution of losses.</p>
</text>
<file name="gb-2010-11-5-r52-S15.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>Finally, we determined the number of calls in this study that were either verified by another platform in this study or found in the Database of Genomic Variants <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>. In total, we computationally confirmed 15,642 (65.6%) of our current calls: 6,301 were gains; 9,726 were losses; and 65 were inversions.</p>
</sec>
<sec>
<st>
<p>Cross-platform comparison</p>
</st>
<p>We performed an in-depth analysis of the characteristics of the variants detected by each of the methods. First, by contrasting against a population-based study <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>, we observed highly similar size estimates for the same underlying SVs between methods (Figure <figr fid="F3">3</figr>). With sufficient genome coverage of clones with accurate and tight insert size, the mate-pair method yields precise variation size. Similarly, the split-read approach gives nucleotide resolution breakpoints, while the high-density CGH and SNP arrays have dense probe coverage to accurately identify the start and end points of SVs. Overall, our multiple approaches are highly robust in estimating variant size.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Agreement between the non-redundant set of Venter CNVs and genotype-validated variable loci</p></caption><text>
   <p><b>Agreement between the non-redundant set of Venter CNVs and genotype-validated variable loci</b>. The agreement between sites identified by different detection methods was measured by the percentage of reciprocal overlap between the estimated size for the non-redundant set of Venter variants and the estimated size for the CNVs generated and genotyped in the Genome Structural Variation (GSV) population genetics study <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Two sites were considered overlapping if the reciprocal overlap among their estimated sizes was &#8805; 50%. The lower right corner plot summarizes the mean discrepancy between Venter and GSV loci sizes, as a proportion of the GSV-estimated CNV size.</p>
</text><graphic file="gb-2010-11-5-r52-3" hint_layout="single"/></fig>
<p>Next, we compared the variants discovered by the two whole genome CGH array sets, NimbleGen 42 M and Agilent 24 M, and investigated the primary reason for the discordance between the two data sets. Not surprisingly, a substantial portion of the discordant calls can be explained by the difference in probe coverage. In fact, approximately 70% of the unique calls on the NimbleGen 42 M array had inadequate probe coverage on the Agilent 24 M array to be able to call variants, and approximately 30% <it>vice versa </it>(Additional file <supplr sid="S16">16</supplr>). After that, we compared the number of calls uniquely identified by the SNP-genotyping microarrays, and we identified 12 and 0 novel SVs contributed by Affymetrix 6.0 and Illumina 1 M, respectively. Of the 12 new Affymetrix calls, 9 are located in complex regions containing blocks of segmental duplications.</p>
<suppl id="S16">
<title>
<p>Additional file 16</p>
</title>
<text>
<p>
<b>Cumulative distribution of probe coverage</b>. <b>(a) </b>Agilent 24 M array probe coverage across NimbleGen 24 M variants. The x-axis begins at 5 - the minimum requirement to call variants on the Agilent array. Hence, the majority of the unconfirmed NimbleGen variants (approximately 70%) were targeted less than five Agilent probes. <b>(b) </b>NimbleGen 42 M array probe coverage across Agilent 24 M variants. The x-axis begins at 10, which is the required number of probes for the NimbleGen array to make a call.</p>
</text>
<file name="gb-2010-11-5-r52-S16.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>Subsequently, when looking for enrichment of genomic features among variants detected by different approaches, we found that there was a significant enrichment (<it>P </it>&lt; 0.01) of short interspersed nuclear elements (SINEs) in deletions called by sequencing-based approaches (mate-pair and split-read), but not in deletions called by the microarrays. Microarrays have low sensitivity for detecting copy number change of SINEs (for example, Alu elements), as these regions cannot be uniquely targeted by short oligo probes, and over-saturation of probe fluorescence would prevent an accurate high copy count. Meanwhile, the sequencing methods employed here do not rely on alignments within the repeat itself, and consequently they are readily able to detect gains and losses of these high-copy repeats. The complete result for enrichment of SVs with various genomic features is shown in Additional file <supplr sid="S17">17</supplr>.</p>
<suppl id="S17">
<title>
<p>Additional file 17</p>
</title>
<text>
<p>
<b>A summary list of structural variants overlap with genomic features</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S17.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>Finally, one of the main challenges of genome assembly is to correctly assemble both alleles in regions of SV. To identify heterozygous events among the split-read indels, we searched for evidence of an alternative allele. Indels were determined to be heterozygous if two or more sequence reads could be aligned that supported the NCBI build 36 allele. From the split-read dataset alone, we identified 4,476 of 8,511 (52.6%) insertions and 6,906 of 11,659 (59.2%) deletions as heterozygous. Additionally, we found that of the 10,834 split-read indels that overlapped with results from the Levy <it>et al. </it>study <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, 4,332 events annotated as heterozygous in our results were previously classified as homozygous (Additional file <supplr sid="S4">4</supplr>). These differences highlight the difficulty of assembling both alternative alleles in regions of SV, leading to an underestimate of the heterozygosity in Levy <it>et al. </it>
<abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>The total variation content of the Venter genome</p>
</st>
<p>In an attempt to estimate the total variation content in the Venter genome, we combined the SVs previously described in the Venter genome in the Levy <it>et al. </it>paper <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> with the variants discovered in this study, to generate a non-redundant set of variants. We determined that 48,777,466 bp was structurally variable, of which 19,981,062 bp belonged to gains, 19,539,369 bp to losses, and 9,257,035 bp to balanced inversions (Table <tblr tid="T1">1</tblr>). A vast majority of this variation was discovered in the current analyses (83.3% or 40,625,059 bp) of the Venter genome. Therefore, our significant contribution in detecting novel calls underscores the importance of using multiple analysis strategies for detecting SV in the human genome. See Additional file <supplr sid="S18">18</supplr> for the location of SVs &gt;1 kb, and Additional files <supplr sid="S19">19</supplr>, <supplr sid="S20">20</supplr> and <supplr sid="S21">21</supplr> for a complete list of variation in the Venter genome.</p>
<suppl id="S18">
<title>
<p>Additional file 18</p>
</title>
<text>
<p>
<b>Genome-wide distribution of large SVs in Venter</b>. The sites of 2,772 SVs whose position spans &gt;1 kb are shown. Red bars represent insertion or duplication, blue bars represent deletions, and green bars represent inversions.</p>
</text>
<file name="gb-2010-11-5-r52-S18.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S19">
<title>
<p>Additional file 19</p>
</title>
<text>
<p>
<b>A non-redundant set of Venter insertions and duplications</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S19.ZIP">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S20">
<title>
<p>Additional file 20</p>
</title>
<text>
<p>
<b>A non-redundant set of Venter deletions</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S20.ZIP">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S21">
<title>
<p>Additional file 21</p>
</title>
<text>
<p>
<b>A non-redundant set of Venter inversions</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S21.XLSX">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Comparison with other personal genomes</p>
</st>
<p>When we compared the complete set of Venter's SVs with those from other published genomes <abbrgrp>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp> (Additional file <supplr sid="S1">1</supplr>), we found that 209,493/808,345 (25.9%) of the Venter variants overlapped variants described in one or more of the other six studies. Upon examining the size distribution of variants from different studies, particularly the size of insertions and duplications, we realized that studies based primarily on next generation sequencing (NGS) data for variation calling were unable to identify calls in certain size ranges (Figure <figr fid="F4">4</figr>). These results further signify that, at present, multiple approaches are needed to capture SVs across the entire size spectrum. The most obvious limitation is that short next generation sequencing NGS reads/inserts fail to capture insertion events greater than the size of the reads/inserts.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Difference in the size distributions of reported indels/CNVs in published personal genome sequencing studies</p></caption><text>
   <p><b>Difference in the size distributions of reported indels/CNVs in published personal genome sequencing studies</b>. The graphs show variation found in a few personal genome sequencing studies <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. These diagrams indicate that multiple approaches are needed for better detection of CNVs. Here, the total variant set in the Venter genome found in both the Levy <it>et al. </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and the current study is displayed. Unlike the current study where the size of mate-pair indels is equal to the difference between the mapping distance and the expected insert size, the SVs in the Ahn <it>et al. </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp> study are only based on the mapping distance. Besides the NGS data, we have also included the variants detected by the high density Agilent 24 M data in the Kim <it>et al. </it><abbrgrp><abbr bid="B7">7</abbr></abbrgrp> study. In Wheeler <it>et al. </it><abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, insertions identified by intra-read alignment would be limited by the size of the sequencing read; hence, large insertions beyond the read length were not detected. Wang <it>et al. </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, Kim <it>et al.</it>, and McKernan <it>et al. </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp> detected small variants based on split-reads and large ones based on mate-pairs and microarrays, but failed to detect variation between these size ranges. Also, see Additional file <supplr sid="S1">1</supplr>. <b>(a) </b>Insertion and duplication size distribution. <b>(b) </b>Deletion size distribution.</p>
</text><graphic file="gb-2010-11-5-r52-4" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Functional importance of structural variation</p>
</st>
<p>Next, we analyzed the complete set of SVs in Venter for overlap with features of the genome with known functional significance, which might influence health outcomes (Table <tblr tid="T2">2</tblr>). We found 189 genes to be completely encompassed by gains or losses, 4,867 non-redundant genes (3,126 impacted by gains and 3,025 by losses) whose exons were impacted, and 573 of these to be in the Online Mendelian Inheritance in Man (OMIM) Disease database (Additional files <supplr sid="S22">22</supplr>, <supplr sid="S23">23</supplr>, <supplr sid="S24">24</supplr>, <supplr sid="S25">25</supplr> and <supplr sid="S26">26</supplr>). However, there was an overall paucity of SV (<it>P </it>&#8805; 0.999) overlapping exonic sequences of genes associated with autosomal dominant/recessive diseases, cancer disease, and imprinted and dosage-sensitive genes. In general, there is an absence of variation in both exonic and regulatory sequences, such as enhancers, promoters and CpG islands, in the genome of this individual.</p>
<suppl id="S22">
<title>
<p>Additional file 22</p>
</title>
<text>
<p>
<b>List of Venter gains that overlap with exons of RefSeq genes</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S22.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S23">
<title>
<p>Additional file 23</p>
</title>
<text>
<p>
<b>List of Venter losses that overlap with exons of RefSeq genes</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S23.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S24">
<title>
<p>Additional file 24</p>
</title>
<text>
<p>
<b>List of Venter gains that overlap with exons of OMIM genes</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S24.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S25">
<title>
<p>Additional file 25</p>
</title>
<text>
<p>
<b>List of Venter losses that overlap with exons of OMIM genes</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S25.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S26">
<title>
<p>Additional file 26</p>
</title>
<text>
<p>
<b>A detailed list of genes that are completely encompassed with non-redundant gains and losses</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S26.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<tbl hint_layout="double" id="T2"><title><p>Table 2</p></title><caption><p>Genomic landscape and structural variants in the Venter genome*</p></caption><tblbdy cols="7">
      <r>
         <c>
            <p/>
         </c>
         <c cspan="3" ca="center">
            <p>
               <b>Total non-redundant gains<sup>b</sup></b>
            </p>
         </c>
         <c cspan="3" ca="center">
            <p>
               <b>Total non-redundant losses<sup>c</sup></b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="3">
            <hr/>
         </c>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Genomic feature (number of entries)<sup>a</sup></b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Number of (%) genomic features</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Number of (%) structural variants</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b><it>P</it>-values</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Number of (%) genomic features</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Number of (%) structural variants</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b><it>P</it>-values</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>RefSeq gene loci<sup>d </sup>(20,174)</p>
         </c>
         <c ca="center">
            <p>14,268 (70.72%)</p>
         </c>
         <c ca="center">
            <p>159,250 (38.17%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>13,951 (69.15%)</p>
         </c>
         <c ca="center">
            <p>149,568 (38.26%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>RefSeq gene entire transcript loci<sup>e </sup>(20,174)</p>
         </c>
         <c ca="center">
            <p>101 (0.50%)</p>
         </c>
         <c ca="center">
            <p>41 (0.01%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>91 (0.45%)</p>
         </c>
         <c ca="center">
            <p>47 (0.01%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>RefSeq gene exons<sup>f </sup>(20,174)</p>
         </c>
         <c ca="center">
            <p>3,126 (15.50%)</p>
         </c>
         <c ca="center">
            <p>3,890 (0.93%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>3,025 (14.99%)</p>
         </c>
         <c ca="center">
            <p>3,723 (0.95%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Enhancer elements (837)</p>
         </c>
         <c ca="center">
            <p>80 (9.56%)</p>
         </c>
         <c ca="center">
            <p>85 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>84 (10.04%)</p>
         </c>
         <c ca="center">
            <p>93 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Promoters (20,174)</p>
         </c>
         <c ca="center">
            <p>2,007 (9.95%)</p>
         </c>
         <c ca="center">
            <p>2,071 (0.50%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>1,812 (8.98%)</p>
         </c>
         <c ca="center">
            <p>1,922 (0.49%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Stop codons<sup>g </sup>(30,885)</p>
         </c>
         <c ca="center">
            <p>225 (0.73%)</p>
         </c>
         <c ca="center">
            <p>99 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>272 (0.88%)</p>
         </c>
         <c ca="center">
            <p>134 (0.03%)</p>
         </c>
         <c ca="center">
            <p>0.563</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>OMIM disease gene loci (3,737)</p>
         </c>
         <c ca="center">
            <p>1,658 (44.37%)</p>
         </c>
         <c ca="center">
            <p>20,589 (4.93%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>1,664 (44.53%)</p>
         </c>
         <c ca="center">
            <p>19,396 (4.96%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>OMIM disease gene exons (3,737)</p>
         </c>
         <c ca="center">
            <p>367 (9.82%)</p>
         </c>
         <c ca="center">
            <p>458 (0.11%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>383 (10.25%)</p>
         </c>
         <c ca="center">
            <p>492 (0.13%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Autosomal dominant gene loci (316)</p>
         </c>
         <c ca="center">
            <p>247 (78.16%)</p>
         </c>
         <c ca="center">
            <p>2,773 (0.66%)</p>
         </c>
         <c ca="center">
            <p>0.023</p>
         </c>
         <c ca="center">
            <p>245 (77.53%)</p>
         </c>
         <c ca="center">
            <p>2,593 (0.66%)</p>
         </c>
         <c ca="center">
            <p>0.031</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Autosomal dominant gene exons (316)</p>
         </c>
         <c ca="center">
            <p>60 (18.99%)</p>
         </c>
         <c ca="center">
            <p>70 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>64 (20.25%)</p>
         </c>
         <c ca="center">
            <p>78 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Autosomal recessive gene loci (472)</p>
         </c>
         <c ca="center">
            <p>386 (81.78%)</p>
         </c>
         <c ca="center">
            <p>3,931 (0.94%)</p>
         </c>
         <c ca="center">
            <p>0.065</p>
         </c>
         <c ca="center">
            <p>402 (85.17%)</p>
         </c>
         <c ca="center">
            <p>3,749 (0.96%)</p>
         </c>
         <c ca="center">
            <p>0.009</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Autosomal recessive gene exons (472)</p>
         </c>
         <c ca="center">
            <p>58 (12.29%)</p>
         </c>
         <c ca="center">
            <p>78 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>86 (18.22%)</p>
         </c>
         <c ca="center">
            <p>109 (0.03%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cancer disease gene loci (363)</p>
         </c>
         <c ca="center">
            <p>301 (82.92%)</p>
         </c>
         <c ca="center">
            <p>4,202 (1.01%)</p>
         </c>
         <c ca="center">
            <p>0.651</p>
         </c>
         <c ca="center">
            <p>307 (84.57%)</p>
         </c>
         <c ca="center">
            <p>3,899 (1.00%)</p>
         </c>
         <c ca="center">
            <p>0.821</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cancer disease gene exons (363)</p>
         </c>
         <c ca="center">
            <p>66 (18.18%)</p>
         </c>
         <c ca="center">
            <p>85 (0.02%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>71 (19.56%)</p>
         </c>
         <c ca="center">
            <p>98 (0.03%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Dosage sensitive gene loci (145)</p>
         </c>
         <c ca="center">
            <p>120 (82.76%)</p>
         </c>
         <c ca="center">
            <p>2,995 (0.72%)</p>
         </c>
         <c ca="center">
            <p>0.604</p>
         </c>
         <c ca="center">
            <p>125 (86.21%)</p>
         </c>
         <c ca="center">
            <p>2,794 (0.71%)</p>
         </c>
         <c ca="center">
            <p>0.728</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Dosage sensitive gene exons (145)</p>
         </c>
         <c ca="center">
            <p>39 (26.90%)</p>
         </c>
         <c ca="center">
            <p>51 (0.01%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>41 (28.28%)</p>
         </c>
         <c ca="center">
            <p>58 (0.01%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Genomic disorders (52)</p>
         </c>
         <c ca="center">
            <p>50 (96.15%)</p>
         </c>
         <c ca="center">
            <p>14,178 (3.40%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>51 (98.08%)</p>
         </c>
         <c ca="center">
            <p>13,373 (3.42%)</p>
         </c>
         <c ca="center">
            <p>0.996</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pharmacogenetic gene loci (186)</p>
         </c>
         <c ca="center">
            <p>97 (52.15%)</p>
         </c>
         <c ca="center">
            <p>853 (0.20%)</p>
         </c>
         <c ca="center">
            <p>0.517</p>
         </c>
         <c ca="center">
            <p>96 (51.61%)</p>
         </c>
         <c ca="center">
            <p>838 (0.21%)</p>
         </c>
         <c ca="center">
            <p>0.105</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pharmacogenetic gene exons (186)</p>
         </c>
         <c ca="center">
            <p>21 (11.29%)</p>
         </c>
         <c ca="center">
            <p>27 (0.01%)</p>
         </c>
         <c ca="center">
            <p>0.998</p>
         </c>
         <c ca="center">
            <p>23 (12.37%)</p>
         </c>
         <c ca="center">
            <p>29 (0.01%)</p>
         </c>
         <c ca="center">
            <p>0.984</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Imprinted gene loci (59)</p>
         </c>
         <c ca="center">
            <p>39 (66.10%)</p>
         </c>
         <c ca="center">
            <p>405 (0.10%)</p>
         </c>
         <c ca="center">
            <p>0.989</p>
         </c>
         <c ca="center">
            <p>37 (62.71%)</p>
         </c>
         <c ca="center">
            <p>378 (0.10%)</p>
         </c>
         <c ca="center">
            <p>0.982</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Imprinted gene exons (59)</p>
         </c>
         <c ca="center">
            <p>13 (22.03%)</p>
         </c>
         <c ca="center">
            <p>15 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.998</p>
         </c>
         <c ca="center">
            <p>11 (18.64%)</p>
         </c>
         <c ca="center">
            <p>13 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MicroRNAs (685)</p>
         </c>
         <c ca="center">
            <p>8 (1.17%)</p>
         </c>
         <c ca="center">
            <p>9 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.785</p>
         </c>
         <c ca="center">
            <p>11 (1.61%)</p>
         </c>
         <c ca="center">
            <p>9 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.836</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GWAS loci (419)</p>
         </c>
         <c ca="center">
            <p>415 (99.05%)</p>
         </c>
         <c ca="center">
            <p>9,413 (2.26%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>416 (99.28%)</p>
         </c>
         <c ca="center">
            <p>8,852 (2.26%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GWAS SNPs (419)</p>
         </c>
         <c ca="center">
            <p>1 (0.24%)</p>
         </c>
         <c ca="center">
            <p>1 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.786</p>
         </c>
         <c ca="center">
            <p>2 (0.48%)</p>
         </c>
         <c ca="center">
            <p>2 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.810</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CpG islands (14,867)</p>
         </c>
         <c ca="center">
            <p>287 (1.93%)</p>
         </c>
         <c ca="center">
            <p>1,516 (0.36%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>299 (2.01%)</p>
         </c>
         <c ca="center">
            <p>1,508 (0.39%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DNAseI hypersensitivity sites (95,709)</p>
         </c>
         <c ca="center">
            <p>6,524 (6.82%)</p>
         </c>
         <c ca="center">
            <p>7,165 (1.72%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>6,392 (6.68%)</p>
         </c>
         <c ca="center">
            <p>6,914 (1.77%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Recombination hotspots (32,996)</p>
         </c>
         <c ca="center">
            <p>16,839 (51.03%)</p>
         </c>
         <c ca="center">
            <p>30,315 (7.27%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
         <c ca="center">
            <p>16,211 (49.13%)</p>
         </c>
         <c ca="center">
            <p>28,407 (7.27%)</p>
         </c>
         <c ca="center">
            <p>0.000</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Segmental duplications (51,809)</p>
         </c>
         <c ca="center">
            <p>17,172 (33.14%)</p>
         </c>
         <c ca="center">
            <p>13,864 (3.32%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>16,518 (31.88%)</p>
         </c>
         <c ca="center">
            <p>13,177 (3.37%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Ultra-conserved elements (481)</p>
         </c>
         <c ca="center">
            <p>2 (0.42%)</p>
         </c>
         <c ca="center">
            <p>2 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>2 (0.42%)</p>
         </c>
         <c ca="center">
            <p>2 (0.00%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Affy 6.0 SNPs<sup>h </sup>(907,691)</p>
         </c>
         <c ca="center">
            <p>1,556 (0.17%)</p>
         </c>
         <c ca="center">
            <p>389 (0.09%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>3,022 (0.33%)</p>
         </c>
         <c ca="center">
            <p>934 (0.24%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Illumina 1 M SNPs<sup>i </sup>(1,048,762)</p>
         </c>
         <c ca="center">
            <p>2,318 (0.22%)</p>
         </c>
         <c ca="center">
            <p>601 (0.14%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
         <c ca="center">
            <p>4,789 (0.46%)</p>
         </c>
         <c ca="center">
            <p>1,536 (0.39%)</p>
         </c>
         <c ca="center">
            <p>0.999</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>*This table shows how structural variation affects different functional annotations and sequence characteristics in the Venter genome. The leftmost column shows the names and total number of genomic features. The rest of the table is divided between gains and losses. Within the gain category, the first left column shows the number of (and percentage of total) genomic features impacted, and the second column shows the corresponding number of (and percentage of total) gain variants, and the last column shows the significance of the overlap as determined by simulations. An identical format is used for the losses. <sup>a</sup>See Additional file <supplr sid="S17">17</supplr> for a list of data sources. <sup>b</sup>Based on a non-redundant list of 417,206 gains and insertions detected in this and the Levy <it>et al. </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp> study of the Venter genome. <sup>c</sup>Based on a non-redundant list of 390,973 deletions detected in this and the Levy <it>et al. </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp> study of the Venter genome. <sup>d</sup>Genes where a structural variant resides anywhere within the transcript (exonic and intronic). <sup>e</sup>Genes from the RefSeq data set where the entire transcript locus is encompassed by the structural variant. <sup>f</sup>Genes from the RefSeq data set where exonic sequence is impacted by the structural variant. The non-redundant number of genes altered in some way by duplications and deletions is 4,867. <sup>g</sup>Structural variants that overlap/impact a stop codon from the RefSeq gene set. <sup>h</sup>Probes on the Affymetrix 6.0 Commercial array. <sup>i</sup>Probes on the Illumina 1 M array. GWAS, genome-wide association studies; OMIM, Online Mendelian Inheritance in Man.</p>
   </tblfn></tbl>
<p>Currently, direct-to-consumer testing companies and genome-wide association studies mainly use microarray-based SNP data <abbrgrp>
<abbr bid="B26">26</abbr>
<abbr bid="B27">27</abbr>
</abbrgrp>, but SVs are typically not considered. Venter indels/CNVs, however, overlap with 4,565 and 7,047 of SNPs on the Affymetrix SNP-Array 6.0 and Illumina-BeadChip 1 M products (two commonly used arrays) potentially impacting genotype calling, most notably when deletions are involved.</p>
<p>Moreover, our attempts to impute SV calls using tagging-SNPs captured 308 of 405 (76.0%) Venter bi-allelic SVs for which we could infer genotypes (Additional file <supplr sid="S27">27</supplr>) <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>. Based on population data, rare SVs with minimal allele frequency &#8804; 0.05 showed the lowest correlation with surrounding SNPs, thus indicating that these SVs were least imputable (Figure <figr fid="F5">5</figr>). The fraction of imputable SVs will be even lower when multi-allelic and complex SVs are considered because the new mutation rate at these sites is higher.</p>
<suppl id="S27">
<title>
<p>Additional file 27</p>
</title>
<text>
<p>
<b>Comparison of Venter SVs with population-based genotyped and SNP-imputable CNVs</b>.</p>
</text>
<file name="gb-2010-11-5-r52-S27.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Tagging pattern for HuRef SVs as a function of its minimum allele frequency (MAF)</p></caption><text>
   <p><b>Tagging pattern for HuRef SVs as a function of its minimum allele frequency (MAF)</b>. Linkage disequilibrium is depicted as the best <it>r</it><sup>2 </sup>between a SV and a HapMap SNP in 120 Europeans (CEU). There were a total of 405 bi-allelic polymorphic SV sites of overlap between GSV and HuRef loci; 24% of the SV loci have a HapMap SNP with <it>r</it><sup>2 </sup>&lt; 0.8 in CEU, a cutoff below which HuRef CNVs would not be imputed simply by SNP detection. The line graph corresponds to the left y-axis, while the bar graph corresponds to the right y-axis. It should be noted that this analysis is performed on a small subset of bi-allelic SVs and that the ability to impute a larger fraction of SVs based on common SNPs would be even lower.</p>
</text><graphic file="gb-2010-11-5-r52-5" hint_layout="single"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>Human geneticists have long sought to know the extent of genetic variation and here, in the most comprehensive analysis to date, we present the latest estimates of greater than 1% within an individual genome. Using multiple computational and experimental approaches, this study substantially expands on the SV map initially constructed by Levy and colleagues <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>; more than 80% of the total 48,777,466 structurally variable bases have not been reported from the original sequencing of the Venter genome.</p>
<p>Our study here differs from previous studies in many ways. Our mate-pair approach makes use of multiple different clone insert sizes, ranging from 2 to 37 kb, and this enables us to detect a wide size range of variants compared to previous paired-end mapping focused studies <abbrgrp>
<abbr bid="B15">15</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>. Furthermore, the long sequence reads used here increase alignment accuracy, and enable the identification of intra-alignment gaps. Using microarrays, we are able to identify large size variants that can be challenging to identify by sequencing.</p>
<p>Furthermore, our results highlight that each variation-discovery strategy has limitations and that no single approach can capture the entire spectrum of genetic variation, thus emphasizing the importance of applying multiple strategies in SV detection. Figure <figr fid="F4">4</figr> shows that the variation distribution of other personal genome sequencing studies, which relied almost exclusively on NGS technology, is substantially lower than the Venter annotation across many size ranges.</p>
<p>There are still some regions, such as heterochromatin (Additional file <supplr sid="S18">18</supplr>) and highly identical segmental duplication regions, where all of the current approaches have limited detection capabilities. To prevent false discovery, we have used stringent alignment criteria, excluded alignments to multiple high-identity sequences, and will therefore likely miss variants within or flanking these sequences. Insufficient probe coverage and low intensity ratio fold-change also prevent microarrays from capturing CNV of highly repetitive sequences (for example, Alu elements). As such, we suspect there will be more variants to be discovered, but their ascertainment will require specialized experimental <abbrgrp>
<abbr bid="B18">18</abbr>
<abbr bid="B28">28</abbr>
</abbrgrp> and algorithmic <abbrgrp>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
</abbrgrp> approaches. Further increases in read-depth can yield new variants. Indeed, the greatest relative number of SVs discovered in Venter is in the 10-kb size range (Figure <figr fid="F2">2</figr>), corresponding to the interval with the highest clone coverage <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> (Additional file <supplr sid="S2">2</supplr>). As expected, our results also show that using several libraries with different insert size leads to increased variation discovery.</p>
<p>The importance of SV to gene expression (direct and indirect) <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>, protein structure <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>, and chromosome stability <abbrgrp>
<abbr bid="B34">34</abbr>
<abbr bid="B35">35</abbr>
</abbrgrp> is being increasingly recognized in normal development and disease <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>. At the same time we show that SVs are: 1, grossly under-represented in published NGS sequencing projects; 2, not always imputable by SNP-based association; 3, ubiquitous along chromosomes impacting all known functional genomic features; and 4, often large, complex, and under negative or purifying selection <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>. Coupling these observations with conjectures that prophylactic decisions will be best informed by higher-penetrance rare alleles <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp> and that common SNPs explain only a proportion of heritability <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp> argue persuasively that SVs should gain more prominence in genomic medicine.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Our results present the most thorough estimate to date of the total complement of genetic variation across the entire size spectrum in a human genome. Our findings indicate that, to date, NGS-based personal genome studies, despite having generated a significant amount of valuable genomic information, have captured only a fraction of SVs, with substantial gaps in discovery at specific points along the size range of variation. Our data indicate that SV discovery is largely dependent on the strategy used, and presently there is no single approach that can readily capture all types of variation and that a combination of strategies is required. The data also show that structural variation impact many genes that have been linked to human disease phenotypes, and that interpretation of these data is complex <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. Current genotyping services offered in the personal genomics field do not always include screening for SVs, and we find that interpretation of current SNP-based screening may be significantly impacted by the existence of SVs. We also show that many SVs will not be amenable to capture using imputation strategies from high density SNP data, arguing for direct detection of SVs as a complement to SNP analysis.</p>
</sec>
<sec>
<st>
<p>Materials and methods</p>
</st>
<sec>
<st>
<p>Sequencing-based analysis</p>
</st>
<p>The sequence data of J Craig Venter's genome (or the Venter genome) used for analysis was originally produced through experiments performed in the Venter <it>et al. </it>
<abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp> and Levy <it>et al. </it>
<abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp> studies. The sequence trace data and information files were downloaded from NCBI. In this study, we aligned 31,546,016 Venter sequences to the NCBI human genome assembly build 36 using BLAT <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp>. For paired-end mapping, the optimal placement of clone ends was determined by a modified version of the scoring scheme used in Tuzun <it>et al. </it>
<abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>. We categorized mate-pairs that mapped less than three standard deviations from the expected clone size as putative insertions, greater than three standard deviations as putative deletions, and in the wrong orientation as putative inversions. We required each variant to be confirmed by at least two clones, and for indels, we required the clones to be from libraries of the same average insert size (2 kb, 10 kb or 37 kb). To identify small variants, the read alignment profiles were further examined for an intra-alignment gap with size greater than 10 bp. Two independent 'split-reads' were required to call a putative variant.</p>
</sec>
<sec>
<st>
<p>Array-based analysis</p>
</st>
<p>An Agilent 24 million features CGH array set (Agilent 24 M) was designed with 23.5 million 60-mer oligonucleotide probes tiled along the NCBI build 36 assembly. The Venter genomic DNA was co-hybridized with the female sample NA15510 from the Polymorphism Discovery Resource <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. The statistical algorithm ADM-2 by Agilent Technologies was used to identify CNVs based on the combined log <sub>2 </sub>ratios. Similar experimental procedures and analyses are described in other studies <abbrgrp>
<abbr bid="B7">7</abbr>
<abbr bid="B41">41</abbr>
</abbrgrp>. Additionally, a custom NimbleGen 42 million features CGH microarray (NimbleGen 42 M) was used in this study - its design, experimental procedures and data analysis have been described in detail elsewhere <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B22">22</abbr>
</abbrgrp>. Venter genomic DNA was also co-hybridized with the sample NA15510. For both the Agilent 24 M and NimbleGen 42 M arrays, CNVs with &gt;50% reciprocal overlap and opposite orientation of variants identified in NA15510 in Conrad <it>et al. </it>
<abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> were removed, as these were specific to the reference.</p>
<p>The Venter sample was also run on the Affymetrix SNP Array 6.0 and Illumina BeadChip 1 M genotyping arrays. We followed the protocol recommended by the manufacturers. For Affymetrix 6.0, the default parameters in the BirdSeed v2 algorithm were used to perform SNP calling. Partek Genomics Suite (Partek Inc., St. Louis, Missouri, USA), Genotyping Console (Affymetrix, Inc., Santa Clara, California, USA), BirdSuite <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp> and iPattern (J Zhang <it>et al.</it>, manuscript submitted) were used to call CNVs. For Illumina 1 M, the SNP calling was done using the BeadStudio software. QuantiSNP <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp> and iPattern were used to identify CNVs. For both platforms, only variants confirmed by at least two calling algorithms were included in the final set of calls.</p>
<p>The Agilent Custom Human 244 K CGH array (Agilent 244 K) was designed to target 9,018 sequences &gt;500 bp in length that were annotated as 'unmatched' sequences in Khaja <it>et al. </it>
<abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. CGH experiments were performed with genomic DNA from Venter and six HapMap samples, hybridized against reference NA10851. Feature extraction and normalization were performed using the Agilent feature extraction software. The programs ADM-1 in the DNA Analytics 4.0 suite (Agilent Technologies, Santa Clara, California, USA), and GADA <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp> were independently used to call CNVs, and those that were confirmed by both algorithms were then used in this study.</p>
</sec>
<sec>
<st>
<p>Non-redundant variant data set</p>
</st>
<p>To generate a non-redundant set of Venter variants, we combined the lists of SVs generated. For CNVs, to determine if two calls are the same, we required that they shared a minimum of 50% size reciprocal overlap; for inversions, we required that they shared at least one boundary. For those calls that were indicated to be the same variant, we recorded the one with the best size/boundary estimate (with preference given to assembly comparison, then split-read, NimbleGen-42 M, Agilent 24 M, mate-pair, Affymetrix 6.0, and Illumina 1 M, in that order). For this analysis, we excluded variants called in the custom Agilent 244 K arrays.</p>
</sec>
<sec>
<st>
<p>PCR and quantitative real-time PCR validation</p>
</st>
<p>We used multiple computational and experimental approaches to validate SVs found in this project. PCR primers were designed to target flanking sequences of indels detected by sequencing-based methods, such that PCR products representing the different alleles can be differentiated on a 1.5% agarose gel. DNA from Venter and five HapMap individuals of European ancestry were tested in PCR experiments. Amplifications and deletions detected by CGH arrays were tested by qPCR. DNA from Venter and six additional control individuals were used to assess the variability in copy number. Each assay was run in triplicate and the <it>FOXP2 </it>gene was used as the reference for relative quantifications. See Additional file <supplr sid="S12">12</supplr> for all primer sequences.</p>
</sec>
<sec>
<st>
<p>FISH validation</p>
</st>
<p>To validate large variants, FISH experiments were performed using fosmid clones as probes on a lymphoblastoid cell line from Venter and seven other HapMap individuals. Five metaphases were first imaged to check for correct chromosome localization and hybridization, and then interphase FISH was performed to validate predicted inversions, similar to the protocol outlined in the Feuk <it>et al. </it>study <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp> with the addition of the aqua probe, DEAC-5-dUTP (Perkin Elmer, Waltham, Massachusetts, USA; NEL455).</p>
</sec>
<sec>
<st>
<p>Overlap analysis</p>
</st>
<p>Overlap with other datasets, genomic features and between subsets of data in the current paper was performed using custom PERL scripts. When comparing variants, two sites were considered overlapping if the reciprocal overlap among their estimated sizes was &#8805; 50%. Data sources used for the annotations of overlaps with genomic features are listed in Additional file <supplr sid="S17">17</supplr>. To evaluate significance, we created 1,000 randomized sets of simulated variant calls and performed overlap analysis against the same data source. For each simulation, we recorded the number of instances where we observed a higher number of overlaps than the real variant data set. A <it>P</it>-value was computed as the fraction of simulations whose number of overlaps was greater than the number of real overlaps.</p>
</sec>
<sec>
<st>
<p>Structural variation imputation</p>
</st>
<p>Using a cutoff of 50% reciprocal overlap, there were 405 sites of overlap between the Venter and genotyped, validated Genome Structural Variation (GSV) loci. The best <it>r</it>
<sup>2 </sup>value was computed between each of those GSV CNVs and a European's HapMap SNP in the neighboring genomic region. Here, we defined a minimum threshold of <it>r</it>
<sup>2 </sup>= 0.8, below which the Venter SVs were deemed not well imputed by SNP. Detailed description on genotyping, phasing, and tagging calls onto haplotypes defined by HapMap SNPs is presented in the Conrad <it>et al. </it>study <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Data release</p>
</st>
<p>The sequence trace files generated from previous studies <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B39">39</abbr>
</abbrgrp> can be obtained from the 'NCBI Trace Archive', using queries [CENTER_NAME = "JCVI" and SPECIES_CODE = "HOMO SAPIENS" and center_project = "GENOMIC-SEQUENCING-DIPLOID-HUMAN-REFERENCE-GENOME"], [INSERT_SIZE = 10201 and CENTER_NAME = "CRA" and SPECIES_CODE = "homo sapiens"], and [INSERT_SIZE = 1925 and CENTER_NAME = "CRA" and SPECIES_CODE = "homo sapiens"]. All of the microarray data generated in this study are available at the Gene Expression Omnibus (GEO) under the accession number [GEO:GSE20290]. The SV locations, size, and zygosity (when available), are reported in Additional files <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr>, <supplr sid="S7">7</supplr>, <supplr sid="S8">8</supplr> and <supplr sid="S9">9</supplr>, and a non-redundant set of variant data in the Venter genome is reported in Additional files <supplr sid="S19">19</supplr>, <supplr sid="S20">20</supplr> and <supplr sid="S21">21</supplr>.</p>
</sec>
</sec>
<sec>
<st>
<p>Abbreviations</p>
</st>
<p>bp: base pair; CGH: comparative genomic hybridization; CNV: copy number variation; FISH: fluorescence <it>in situ </it>hybridization; GSV: Genome Structural Variation; indel: insertion/deletion; NCBI: National Center for Biotechnology Information; NGS: next generation sequencing; OMIM: Online Mendelian Inheritance in Man; qPCR: quantitative real-time PCR; SINE: short interspersed nuclear element; SNP: single nucleotide polymorphism; SV: structural variation.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>AWP, JRM, DP, DFC, HP, MEH, CL, JCV, EFK, SL, LF and SWS conceived and designed the experiments. AWP, JRM, JW, MAR, and LF performed the mate-pair and split-read analysis, as well as the Affymetrix 6.0 and Illumina 1 M experiments. HP and CL performed the Agilent 24 M experiments, while DP, DFC, and MEH did the NimbleGen 42 M experiments. All authors analyzed the data. AWP, LF and SWS wrote the paper. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>The work is supported by Genome Canada/Ontario Genomics Institute, the Canadian Institutes of Health Research (CIHR), the McLaughlin Centre for Molecular Medicine, the Canadian Institute for Advanced Research, and the Hospital for Sick Children (SickKids) Foundation. AWP holds the Natural Sciences and Engineering Research Council of Canada (NSERC) Alexander Graham Bell Canada Graduate Scholarship. DP is supported by fellowships from the Royal Netherlands Academy of Arts and Sciences (TMF/DA/5801) and the Netherlands Organization for Scientific Research (Rubicon, 825.06.031). LF is supported by the G&#246;ran Gustafsson Foundation and the Swedish Foundation for Strategic Research. SWS holds the GlaxoSmithKline-CIHR Pathfinder Chair in Genetics and Genomics at the University of Toronto and Hospital for Sick Children.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>The diploid genome sequence of an individual human.</p></title><aug><au><snm>Levy</snm><fnm>S</fnm></au><au><snm>Sutton</snm><fnm>G</fnm></au><au><snm>Ng</snm><fnm>PC</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>Halpern</snm><fnm>AL</fnm></au><au><snm>Walenz</snm><fnm>BP</fnm></au><au><snm>Axelrod</snm><fnm>N</fnm></au><au><snm>Huang</snm><fnm>J</fnm></au><au><snm>Kirkness</snm><fnm>EF</fnm></au><au><snm>Denisov</snm><fnm>G</fnm></au><au><snm>Lin</snm><fnm>Y</fnm></au><au><snm>MacDonald</snm><fnm>JR</fnm></au><au><snm>Pang</snm><fnm>AW</fnm></au><au><snm>Shago</snm><fnm>M</fnm></au><au><snm>Stockwell</snm><fnm>TB</fnm></au><au><snm>Tsiamouri</snm><fnm>A</fnm></au><au><snm>Bafna</snm><fnm>V</fnm></au><au><snm>Bansal</snm><fnm>V</fnm></au><au><snm>Kravitz</snm><fnm>SA</fnm></au><au><snm>Busam</snm><fnm>DA</fnm></au><au><snm>Beeson</snm><fnm>KY</fnm></au><au><snm>McIntosh</snm><fnm>TC</fnm></au><au><snm>Remington</snm><fnm>KA</fnm></au><au><snm>Abril</snm><fnm>JF</fnm></au><au><snm>Gill</snm><fnm>J</fnm></au><au><snm>Borman</snm><fnm>J</fnm></au><au><snm>Rogers</snm><fnm>YH</fnm></au><au><snm>Frazier</snm><fnm>ME</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Strausberg</snm><fnm>RL</fnm></au><etal/></aug><source>PLoS Biol</source><pubdate>2007</pubdate><volume>5</volume><fpage>e254</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.0050254</pubid><pubid idtype="pmcid">1964779,1964779</pubid><pubid idtype="pmpid" link="fulltext">17803354</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>The complete genome of an individual by massively parallel DNA sequencing.</p></title><aug><au><snm>Wheeler</snm><fnm>DA</fnm></au><au><snm>Srinivasan</snm><fnm>M</fnm></au><au><snm>Egholm</snm><fnm>M</fnm></au><au><snm>Shen</snm><fnm>Y</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>McGuire</snm><fnm>A</fnm></au><au><snm>He</snm><fnm>W</fnm></au><au><snm>Chen</snm><fnm>YJ</fnm></au><au><snm>Makhijani</snm><fnm>V</fnm></au><au><snm>Roth</snm><fnm>GT</fnm></au><au><snm>Gomes</snm><fnm>X</fnm></au><au><snm>Tartaro</snm><fnm>K</fnm></au><au><snm>Niazi</snm><fnm>F</fnm></au><au><snm>Turcotte</snm><fnm>CL</fnm></au><au><snm>Irzyk</snm><fnm>GP</fnm></au><au><snm>Lupski</snm><fnm>JR</fnm></au><au><snm>Chinault</snm><fnm>C</fnm></au><au><snm>Song</snm><fnm>XZ</fnm></au><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Yuan</snm><fnm>Y</fnm></au><au><snm>Nazareth</snm><fnm>L</fnm></au><au><snm>Qin</snm><fnm>X</fnm></au><au><snm>Muzny</snm><fnm>DM</fnm></au><au><snm>Margulies</snm><fnm>M</fnm></au><au><snm>Weinstock</snm><fnm>GM</fnm></au><au><snm>Gibbs</snm><fnm>RA</fnm></au><au><snm>Rothberg</snm><fnm>JM</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>452</volume><fpage>872</fpage><lpage>876</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature06884</pubid><pubid idtype="pmpid" link="fulltext">18421352</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Accurate whole human genome sequencing using reversible terminator chemistry.</p></title><aug><au><snm>Bentley</snm><fnm>DR</fnm></au><au><snm>Balasubramanian</snm><fnm>S</fnm></au><au><snm>Swerdlow</snm><fnm>HP</fnm></au><au><snm>Smith</snm><fnm>GP</fnm></au><au><snm>Milton</snm><fnm>J</fnm></au><au><snm>Brown</snm><fnm>CG</fnm></au><au><snm>Hall</snm><fnm>KP</fnm></au><au><snm>Evers</snm><fnm>DJ</fnm></au><au><snm>Barnes</snm><fnm>CL</fnm></au><au><snm>Bignell</snm><fnm>HR</fnm></au><au><snm>Boutell</snm><fnm>JM</fnm></au><au><snm>Bryant</snm><fnm>J</fnm></au><au><snm>Carter</snm><fnm>RJ</fnm></au><au><snm>Keira Cheetham</snm><fnm>R</fnm></au><au><snm>Cox</snm><fnm>AJ</fnm></au><au><snm>Ellis</snm><fnm>DJ</fnm></au><au><snm>Flatbush</snm><fnm>MR</fnm></au><au><snm>Gormley</snm><fnm>NA</fnm></au><au><snm>Humphray</snm><fnm>SJ</fnm></au><au><snm>Irving</snm><fnm>LJ</fnm></au><au><snm>Karbelashvili</snm><fnm>MS</fnm></au><au><snm>Kirk</snm><fnm>SM</fnm></au><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Liu</snm><fnm>X</fnm></au><au><snm>Maisinger</snm><fnm>KS</fnm></au><au><snm>Murray</snm><fnm>LJ</fnm></au><au><snm>Obradovic</snm><fnm>B</fnm></au><au><snm>Ost</snm><fnm>T</fnm></au><au><snm>Parkinson</snm><fnm>ML</fnm></au><au><snm>Pratt</snm><fnm>MR</fnm></au><etal/></aug><source>Nature</source><pubdate>2008</pubdate><volume>456</volume><fpage>53</fpage><lpage>59</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature07517</pubid><pubid idtype="pmcid">2581791</pubid><pubid idtype="pmpid" link="fulltext">18987734</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>The diploid genome sequence of an Asian individual.</p></title><aug><au><snm>Wang</snm><fnm>J</fnm></au><au><snm>Wang</snm><fnm>W</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Tian</snm><fnm>G</fnm></au><au><snm>Goodman</snm><fnm>L</fnm></au><au><snm>Fan</snm><fnm>W</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Guo</snm><fnm>Y</fnm></au><au><snm>Feng</snm><fnm>B</fnm></au><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Lu</snm><fnm>Y</fnm></au><au><snm>Fang</snm><fnm>X</fnm></au><au><snm>Liang</snm><fnm>H</fnm></au><au><snm>Du</snm><fnm>Z</fnm></au><au><snm>Li</snm><fnm>D</fnm></au><au><snm>Zhao</snm><fnm>Y</fnm></au><au><snm>Hu</snm><fnm>Y</fnm></au><au><snm>Yang</snm><fnm>Z</fnm></au><au><snm>Zheng</snm><fnm>H</fnm></au><au><snm>Hellmann</snm><fnm>I</fnm></au><au><snm>Inouye</snm><fnm>M</fnm></au><au><snm>Pool</snm><fnm>J</fnm></au><au><snm>Yi</snm><fnm>X</fnm></au><au><snm>Zhao</snm><fnm>J</fnm></au><au><snm>Duan</snm><fnm>J</fnm></au><au><snm>Zhou</snm><fnm>Y</fnm></au><au><snm>Qin</snm><fnm>J</fnm></au><etal/></aug><source>Nature</source><pubdate>2008</pubdate><volume>456</volume><fpage>60</fpage><lpage>65</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature07484</pubid><pubid idtype="pmcid">2716080</pubid><pubid idtype="pmpid" link="fulltext">18987735</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.</p></title><aug><au><snm>Ley</snm><fnm>TJ</fnm></au><au><snm>Mardis</snm><fnm>ER</fnm></au><au><snm>Ding</snm><fnm>L</fnm></au><au><snm>Fulton</snm><fnm>B</fnm></au><au><snm>McLellan</snm><fnm>MD</fnm></au><au><snm>Chen</snm><fnm>K</fnm></au><au><snm>Dooling</snm><fnm>D</fnm></au><au><snm>Dunford-Shore</snm><fnm>BH</fnm></au><au><snm>McGrath</snm><fnm>S</fnm></au><au><snm>Hickenbotham</snm><fnm>M</fnm></au><au><snm>Cook</snm><fnm>L</fnm></au><au><snm>Abbott</snm><fnm>R</fnm></au><au><snm>Larson</snm><fnm>DE</fnm></au><au><snm>Koboldt</snm><fnm>DC</fnm></au><au><snm>Pohl</snm><fnm>C</fnm></au><au><snm>Smith</snm><fnm>S</fnm></au><au><snm>Hawkins</snm><fnm>A</fnm></au><au><snm>Abbott</snm><fnm>S</fnm></au><au><snm>Locke</snm><fnm>D</fnm></au><au><snm>Hillier</snm><fnm>LW</fnm></au><au><snm>Miner</snm><fnm>T</fnm></au><au><snm>Fulton</snm><fnm>L</fnm></au><au><snm>Magrini</snm><fnm>V</fnm></au><au><snm>Wylie</snm><fnm>T</fnm></au><au><snm>Glasscock</snm><fnm>J</fnm></au><au><snm>Conyers</snm><fnm>J</fnm></au><au><snm>Sander</snm><fnm>N</fnm></au><au><snm>Shi</snm><fnm>X</fnm></au><au><snm>Osborne</snm><fnm>JR</fnm></au><au><snm>Minx</snm><fnm>P</fnm></au><etal/></aug><source>Nature</source><pubdate>2008</pubdate><volume>456</volume><fpage>66</fpage><lpage>72</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature07485</pubid><pubid idtype="pmcid">2603574</pubid><pubid idtype="pmpid" link="fulltext">18987736</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group.</p></title><aug><au><snm>Ahn</snm><fnm>SM</fnm></au><au><snm>Kim</snm><fnm>TH</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Kim</snm><fnm>D</fnm></au><au><snm>Ghang</snm><fnm>H</fnm></au><au><snm>Kim</snm><fnm>D</fnm></au><au><snm>Kim</snm><fnm>BC</fnm></au><au><snm>Kim</snm><fnm>SY</fnm></au><au><snm>Kim</snm><fnm>WY</fnm></au><au><snm>Kim</snm><fnm>C</fnm></au><au><snm>Park</snm><fnm>D</fnm></au><au><snm>Lee</snm><fnm>YS</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Reja</snm><fnm>R</fnm></au><au><snm>Jho</snm><fnm>S</fnm></au><au><snm>Kim</snm><fnm>CG</fnm></au><au><snm>Cha</snm><fnm>JY</fnm></au><au><snm>Kim</snm><fnm>KH</fnm></au><au><snm>Lee</snm><fnm>B</fnm></au><au><snm>Bhak</snm><fnm>J</fnm></au><au><snm>Kim</snm><fnm>SJ</fnm></au></aug><source>Genome Res</source><pubdate>2009</pubdate><volume>19</volume><fpage>1622</fpage><lpage>1629</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.092197.109</pubid><pubid idtype="pmcid">2752128</pubid><pubid idtype="pmpid" link="fulltext">19470904</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>A highly annotated whole-genome sequence of a Korean individual.</p></title><aug><au><snm>Kim</snm><fnm>JI</fnm></au><au><snm>Ju</snm><fnm>YS</fnm></au><au><snm>Park</snm><fnm>H</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Yi</snm><fnm>JH</fnm></au><au><snm>Mudge</snm><fnm>J</fnm></au><au><snm>Miller</snm><fnm>NA</fnm></au><au><snm>Hong</snm><fnm>D</fnm></au><au><snm>Bell</snm><fnm>CJ</fnm></au><au><snm>Kim</snm><fnm>HS</fnm></au><au><snm>Chung</snm><fnm>IS</fnm></au><au><snm>Lee</snm><fnm>WC</fnm></au><au><snm>Lee</snm><fnm>JS</fnm></au><au><snm>Seo</snm><fnm>SH</fnm></au><au><snm>Yun</snm><fnm>JY</fnm></au><au><snm>Woo</snm><fnm>HN</fnm></au><au><snm>Lee</snm><fnm>H</fnm></au><au><snm>Suh</snm><fnm>D</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Kim</snm><fnm>HJ</fnm></au><au><snm>Yavartanoo</snm><fnm>M</fnm></au><au><snm>Kwak</snm><fnm>M</fnm></au><au><snm>Zheng</snm><fnm>Y</fnm></au><au><snm>Lee</snm><fnm>MK</fnm></au><au><snm>Park</snm><fnm>H</fnm></au><au><snm>Kim</snm><fnm>JY</fnm></au><au><snm>Gokcumen</snm><fnm>O</fnm></au><au><snm>Mills</snm><fnm>RE</fnm></au><au><snm>Zaranek</snm><fnm>AW</fnm></au><etal/></aug><source>Nature</source><pubdate>2009</pubdate><volume>460</volume><fpage>1011</fpage><lpage>1015</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2860965</pubid><pubid idtype="pmpid" link="fulltext">19587683</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.</p></title><aug><au><snm>McKernan</snm><fnm>KJ</fnm></au><au><snm>Peckham</snm><fnm>HE</fnm></au><au><snm>Costa</snm><fnm>GL</fnm></au><au><snm>McLaughlin</snm><fnm>SF</fnm></au><au><snm>Fu</snm><fnm>Y</fnm></au><au><snm>Tsung</snm><fnm>EF</fnm></au><au><snm>Clouser</snm><fnm>CR</fnm></au><au><snm>Duncan</snm><fnm>C</fnm></au><au><snm>Ichikawa</snm><fnm>JK</fnm></au><au><snm>Lee</snm><fnm>CC</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Ranade</snm><fnm>SS</fnm></au><au><snm>Dimalanta</snm><fnm>ET</fnm></au><au><snm>Hyland</snm><fnm>FC</fnm></au><au><snm>Sokolsky</snm><fnm>TD</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Sheridan</snm><fnm>A</fnm></au><au><snm>Fu</snm><fnm>H</fnm></au><au><snm>Hendrickson</snm><fnm>CL</fnm></au><au><snm>Li</snm><fnm>B</fnm></au><au><snm>Kotler</snm><fnm>L</fnm></au><au><snm>Stuart</snm><fnm>JR</fnm></au><au><snm>Malek</snm><fnm>JA</fnm></au><au><snm>Manning</snm><fnm>JM</fnm></au><au><snm>Antipova</snm><fnm>AA</fnm></au><au><snm>Perez</snm><fnm>DS</fnm></au><au><snm>Moore</snm><fnm>MP</fnm></au><au><snm>Hayashibara</snm><fnm>KC</fnm></au><au><snm>Lyons</snm><fnm>MR</fnm></au><au><snm>Beaudoin</snm><fnm>RE</fnm></au><etal/></aug><source>Genome Res</source><pubdate>2009</pubdate><volume>19</volume><fpage>1527</fpage><lpage>1541</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.091868.109</pubid><pubid idtype="pmcid">2752135</pubid><pubid idtype="pmpid" link="fulltext">19546169</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Structural variation in the human genome.</p></title><aug><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>Carson</snm><fnm>AR</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2006</pubdate><volume>7</volume><fpage>85</fpage><lpage>97</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg1767</pubid><pubid idtype="pmpid" link="fulltext">16418744</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Common and rare variants in multifactorial susceptibility to common diseases.</p></title><aug><au><snm>Bodmer</snm><fnm>W</fnm></au><au><snm>Bonilla</snm><fnm>C</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>695</fpage><lpage>701</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.f.136</pubid><pubid idtype="pmcid">2527050</pubid><pubid idtype="pmpid" link="fulltext">18509313</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.</p></title><aug><au><snm>Drmanac</snm><fnm>R</fnm></au><au><snm>Sparks</snm><fnm>AB</fnm></au><au><snm>Callow</snm><fnm>MJ</fnm></au><au><snm>Halpern</snm><fnm>AL</fnm></au><au><snm>Burns</snm><fnm>NL</fnm></au><au><snm>Kermani</snm><fnm>BG</fnm></au><au><snm>Carnevali</snm><fnm>P</fnm></au><au><snm>Nazarenko</snm><fnm>I</fnm></au><au><snm>Nilsen</snm><fnm>GB</fnm></au><au><snm>Yeung</snm><fnm>G</fnm></au><au><snm>Dahl</snm><fnm>F</fnm></au><au><snm>Fernandez</snm><fnm>A</fnm></au><au><snm>Staker</snm><fnm>B</fnm></au><au><snm>Pant</snm><fnm>KP</fnm></au><au><snm>Baccash</snm><fnm>J</fnm></au><au><snm>Borcherding</snm><fnm>AP</fnm></au><au><snm>Brownley</snm><fnm>A</fnm></au><au><snm>Cedeno</snm><fnm>R</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Chernikoff</snm><fnm>D</fnm></au><au><snm>Cheung</snm><fnm>A</fnm></au><au><snm>Chirita</snm><fnm>R</fnm></au><au><snm>Curson</snm><fnm>B</fnm></au><au><snm>Ebert</snm><fnm>JC</fnm></au><au><snm>Hacker</snm><fnm>CR</fnm></au><au><snm>Hartlage</snm><fnm>R</fnm></au><au><snm>Hauser</snm><fnm>B</fnm></au><au><snm>Huang</snm><fnm>S</fnm></au><au><snm>Jiang</snm><fnm>Y</fnm></au><au><snm>Karpinchyk</snm><fnm>V</fnm></au><etal/></aug><source>Science</source><pubdate>2010</pubdate><volume>327</volume><fpage>78</fpage><lpage>81</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1181498</pubid><pubid idtype="pmpid" link="fulltext">19892942</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Detection of large-scale variation in the human genome.</p></title><aug><au><snm>Iafrate</snm><fnm>AJ</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>Rivera</snm><fnm>MN</fnm></au><au><snm>Listewnik</snm><fnm>ML</fnm></au><au><snm>Donahoe</snm><fnm>PK</fnm></au><au><snm>Qi</snm><fnm>Y</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au></aug><source>Nat Genet</source><pubdate>2004</pubdate><volume>36</volume><fpage>949</fpage><lpage>951</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1416</pubid><pubid idtype="pmpid" link="fulltext">15286789</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Large-scale copy number polymorphism in the human genome.</p></title><aug><au><snm>Sebat</snm><fnm>J</fnm></au><au><snm>Lakshmi</snm><fnm>B</fnm></au><au><snm>Troge</snm><fnm>J</fnm></au><au><snm>Alexander</snm><fnm>J</fnm></au><au><snm>Young</snm><fnm>J</fnm></au><au><snm>Lundin</snm><fnm>P</fnm></au><au><snm>Maner</snm><fnm>S</fnm></au><au><snm>Massa</snm><fnm>H</fnm></au><au><snm>Walker</snm><fnm>M</fnm></au><au><snm>Chi</snm><fnm>M</fnm></au><au><snm>Navin</snm><fnm>N</fnm></au><au><snm>Lucito</snm><fnm>R</fnm></au><au><snm>Healy</snm><fnm>J</fnm></au><au><snm>Hicks</snm><fnm>J</fnm></au><au><snm>Ye</snm><fnm>K</fnm></au><au><snm>Reiner</snm><fnm>A</fnm></au><au><snm>Gilliam</snm><fnm>TC</fnm></au><au><snm>Trask</snm><fnm>B</fnm></au><au><snm>Patterson</snm><fnm>N</fnm></au><au><snm>Zetterberg</snm><fnm>A</fnm></au><au><snm>Wigler</snm><fnm>M</fnm></au></aug><source>Science</source><pubdate>2004</pubdate><volume>305</volume><fpage>525</fpage><lpage>528</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1098918</pubid><pubid idtype="pmpid" link="fulltext">15273396</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Global variation in copy number in the human genome.</p></title><aug><au><snm>Redon</snm><fnm>R</fnm></au><au><snm>Ishikawa</snm><fnm>S</fnm></au><au><snm>Fitch</snm><fnm>KR</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>Perry</snm><fnm>GH</fnm></au><au><snm>Andrews</snm><fnm>TD</fnm></au><au><snm>Fiegler</snm><fnm>H</fnm></au><au><snm>Shapero</snm><fnm>MH</fnm></au><au><snm>Carson</snm><fnm>AR</fnm></au><au><snm>Chen</snm><fnm>W</fnm></au><au><snm>Cho</snm><fnm>EK</fnm></au><au><snm>Dallaire</snm><fnm>S</fnm></au><au><snm>Freeman</snm><fnm>JL</fnm></au><au><snm>Gonzalez</snm><fnm>JR</fnm></au><au><snm>Gratacos</snm><fnm>M</fnm></au><au><snm>Huang</snm><fnm>J</fnm></au><au><snm>Kalaitzopoulos</snm><fnm>D</fnm></au><au><snm>Komura</snm><fnm>D</fnm></au><au><snm>MacDonald</snm><fnm>JR</fnm></au><au><snm>Marshall</snm><fnm>CR</fnm></au><au><snm>Mei</snm><fnm>R</fnm></au><au><snm>Montgomery</snm><fnm>L</fnm></au><au><snm>Nishimura</snm><fnm>K</fnm></au><au><snm>Okamura</snm><fnm>K</fnm></au><au><snm>Shen</snm><fnm>F</fnm></au><au><snm>Somerville</snm><fnm>MJ</fnm></au><au><snm>Tchinda</snm><fnm>J</fnm></au><au><snm>Valsesia</snm><fnm>A</fnm></au><au><snm>Woodwark</snm><fnm>C</fnm></au><au><snm>Yang</snm><fnm>F</fnm></au><etal/></aug><source>Nature</source><pubdate>2006</pubdate><volume>444</volume><fpage>444</fpage><lpage>454</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature05329</pubid><pubid idtype="pmcid">2669898</pubid><pubid idtype="pmpid" link="fulltext">17122850</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Fine-scale structural variation of the human genome.</p></title><aug><au><snm>Tuzun</snm><fnm>E</fnm></au><au><snm>Sharp</snm><fnm>AJ</fnm></au><au><snm>Bailey</snm><fnm>JA</fnm></au><au><snm>Kaul</snm><fnm>R</fnm></au><au><snm>Morrison</snm><fnm>VA</fnm></au><au><snm>Pertz</snm><fnm>LM</fnm></au><au><snm>Haugen</snm><fnm>E</fnm></au><au><snm>Hayden</snm><fnm>H</fnm></au><au><snm>Albertson</snm><fnm>D</fnm></au><au><snm>Pinkel</snm><fnm>D</fnm></au><au><snm>Olson</snm><fnm>MV</fnm></au><au><snm>Eichler</snm><fnm>EE</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><fpage>727</fpage><lpage>732</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1562</pubid><pubid idtype="pmpid" link="fulltext">15895083</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Genome assembly comparison identifies structural variants in the human genome.</p></title><aug><au><snm>Khaja</snm><fnm>R</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>MacDonald</snm><fnm>JR</fnm></au><au><snm>He</snm><fnm>Y</fnm></au><au><snm>Joseph-George</snm><fnm>AM</fnm></au><au><snm>Wei</snm><fnm>J</fnm></au><au><snm>Rafiq</snm><fnm>MA</fnm></au><au><snm>Qian</snm><fnm>C</fnm></au><au><snm>Shago</snm><fnm>M</fnm></au><au><snm>Pantano</snm><fnm>L</fnm></au><au><snm>Aburatani</snm><fnm>H</fnm></au><au><snm>Jones</snm><fnm>K</fnm></au><au><snm>Redon</snm><fnm>R</fnm></au><au><snm>Hurles</snm><fnm>M</fnm></au><au><snm>Armengol</snm><fnm>L</fnm></au><au><snm>Estivill</snm><fnm>X</fnm></au><au><snm>Mural</snm><fnm>RJ</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au></aug><source>Nat Genet</source><pubdate>2006</pubdate><volume>38</volume><fpage>1413</fpage><lpage>1418</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1921</pubid><pubid idtype="pmcid">2674632</pubid><pubid idtype="pmpid" link="fulltext">17115057</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Paired-end mapping reveals extensive structural variation in the human genome.</p></title><aug><au><snm>Korbel</snm><fnm>JO</fnm></au><au><snm>Urban</snm><fnm>AE</fnm></au><au><snm>Affourtit</snm><fnm>JP</fnm></au><au><snm>Godwin</snm><fnm>B</fnm></au><au><snm>Grubert</snm><fnm>F</fnm></au><au><snm>Simons</snm><fnm>JF</fnm></au><au><snm>Kim</snm><fnm>PM</fnm></au><au><snm>Palejev</snm><fnm>D</fnm></au><au><snm>Carriero</snm><fnm>NJ</fnm></au><au><snm>Du</snm><fnm>L</fnm></au><au><snm>Taillon</snm><fnm>BE</fnm></au><au><snm>Chen</snm><fnm>Z</fnm></au><au><snm>Tanzer</snm><fnm>A</fnm></au><au><snm>Saunders</snm><fnm>AC</fnm></au><au><snm>Chi</snm><fnm>J</fnm></au><au><snm>Yang</snm><fnm>F</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au><au><snm>Weissman</snm><fnm>SM</fnm></au><au><snm>Harkins</snm><fnm>TT</fnm></au><au><snm>Gerstein</snm><fnm>MB</fnm></au><au><snm>Egholm</snm><fnm>M</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au></aug><source>Science</source><pubdate>2007</pubdate><volume>318</volume><fpage>420</fpage><lpage>426</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1149504</pubid><pubid idtype="pmcid">2674581</pubid><pubid idtype="pmpid" link="fulltext">17901297</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Mapping and sequencing of structural variation from eight human genomes.</p></title><aug><au><snm>Kidd</snm><fnm>JM</fnm></au><au><snm>Cooper</snm><fnm>GM</fnm></au><au><snm>Donahue</snm><fnm>WF</fnm></au><au><snm>Hayden</snm><fnm>HS</fnm></au><au><snm>Sampas</snm><fnm>N</fnm></au><au><snm>Graves</snm><fnm>T</fnm></au><au><snm>Hansen</snm><fnm>N</fnm></au><au><snm>Teague</snm><fnm>B</fnm></au><au><snm>Alkan</snm><fnm>C</fnm></au><au><snm>Antonacci</snm><fnm>F</fnm></au><au><snm>Haugen</snm><fnm>E</fnm></au><au><snm>Zerr</snm><fnm>T</fnm></au><au><snm>Yamada</snm><fnm>NA</fnm></au><au><snm>Tsang</snm><fnm>P</fnm></au><au><snm>Newman</snm><fnm>TL</fnm></au><au><snm>Tuzun</snm><fnm>E</fnm></au><au><snm>Cheng</snm><fnm>Z</fnm></au><au><snm>Ebling</snm><fnm>HM</fnm></au><au><snm>Tusneem</snm><fnm>N</fnm></au><au><snm>David</snm><fnm>R</fnm></au><au><snm>Gillett</snm><fnm>W</fnm></au><au><snm>Phelps</snm><fnm>KA</fnm></au><au><snm>Weaver</snm><fnm>M</fnm></au><au><snm>Saranga</snm><fnm>D</fnm></au><au><snm>Brand</snm><fnm>A</fnm></au><au><snm>Tao</snm><fnm>W</fnm></au><au><snm>Gustafson</snm><fnm>E</fnm></au><au><snm>McKernan</snm><fnm>K</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Malig</snm><fnm>M</fnm></au><etal/></aug><source>Nature</source><pubdate>2008</pubdate><volume>453</volume><fpage>56</fpage><lpage>64</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature06862</pubid><pubid idtype="pmcid">2424287</pubid><pubid idtype="pmpid" link="fulltext">18451855</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Origins and functional impact of copy number variation in the human genome.</p></title><aug><au><snm>Conrad</snm><fnm>DF</fnm></au><au><snm>Pinto</snm><fnm>D</fnm></au><au><snm>Redon</snm><fnm>R</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>Gokcumen</snm><fnm>O</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Aerts</snm><fnm>J</fnm></au><au><snm>Andrews</snm><fnm>TD</fnm></au><au><snm>Barnes</snm><fnm>C</fnm></au><au><snm>Campbell</snm><fnm>P</fnm></au><au><snm>Fitzgerald</snm><fnm>T</fnm></au><au><snm>Hu</snm><fnm>M</fnm></au><au><snm>Ihm</snm><fnm>CH</fnm></au><au><snm>Kristiansson</snm><fnm>K</fnm></au><au><snm>Macarthur</snm><fnm>DG</fnm></au><au><snm>Macdonald</snm><fnm>JR</fnm></au><au><snm>Onyiah</snm><fnm>I</fnm></au><au><snm>Pang</snm><fnm>AW</fnm></au><au><snm>Robson</snm><fnm>S</fnm></au><au><snm>Stirrups</snm><fnm>K</fnm></au><au><snm>Valsesia</snm><fnm>A</fnm></au><au><snm>Walter</snm><fnm>K</fnm></au><au><snm>Wei</snm><fnm>J</fnm></au><au><snm>Tyler-Smith</snm><fnm>C</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au></aug><source>Nature</source><pubdate>2010</pubdate><volume>464</volume><fpage>704</fpage><lpage>712</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08516</pubid><pubid idtype="pmpid" link="fulltext">19812545</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Contemplating effects of genomic structural variation.</p></title><aug><au><snm>Buchanan</snm><fnm>JA</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au></aug><source>Genet Med</source><pubdate>2008</pubdate><volume>10</volume><fpage>639</fpage><lpage>647</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1097/GIM.0b013e318183f848</pubid><pubid idtype="pmpid" link="fulltext">18978673</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Evaluation of next generation sequencing platforms for population targeted sequencing studies.</p></title><aug><au><snm>Harismendy</snm><fnm>O</fnm></au><au><snm>Ng</snm><fnm>PC</fnm></au><au><snm>Strausberg</snm><fnm>RL</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Stockwell</snm><fnm>TB</fnm></au><au><snm>Beeson</snm><fnm>KY</fnm></au><au><snm>Schork</snm><fnm>NJ</fnm></au><au><snm>Murray</snm><fnm>SS</fnm></au><au><snm>Topol</snm><fnm>EJ</fnm></au><au><snm>Levy</snm><fnm>S</fnm></au><au><snm>Frazer</snm><fnm>KA</fnm></au></aug><source>Genome Biol</source><pubdate>2009</pubdate><volume>10</volume><fpage>R32</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2009-10-3-r32</pubid><pubid idtype="pmcid">2691003</pubid><pubid idtype="pmpid" link="fulltext">19327155</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Challenges and standards in integrating surveys of structural variation.</p></title><aug><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Birney</snm><fnm>E</fnm></au><au><snm>Altshuler</snm><fnm>DM</fnm></au><au><snm>Eichler</snm><fnm>EE</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au></aug><source>Nat Genet</source><pubdate>2007</pubdate><volume>39</volume><fpage>S7</fpage><lpage>15</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng2093</pubid><pubid idtype="pmcid">2698291</pubid><pubid idtype="pmpid" link="fulltext">17597783</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>An initial map of insertion and deletion (INDEL) variation in the human genome.</p></title><aug><au><snm>Mills</snm><fnm>RE</fnm></au><au><snm>Luttig</snm><fnm>CT</fnm></au><au><snm>Larkins</snm><fnm>CE</fnm></au><au><snm>Beauchamp</snm><fnm>A</fnm></au><au><snm>Tsui</snm><fnm>C</fnm></au><au><snm>Pittard</snm><fnm>WS</fnm></au><au><snm>Devine</snm><fnm>SE</fnm></au></aug><source>Genome Res</source><pubdate>2006</pubdate><volume>16</volume><fpage>1182</fpage><lpage>1190</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.4565806</pubid><pubid idtype="pmcid">1557762</pubid><pubid idtype="pmpid" link="fulltext">16902084</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Whole-genome shotgun assembly and comparison of human genome assemblies.</p></title><aug><au><snm>Istrail</snm><fnm>S</fnm></au><au><snm>Sutton</snm><fnm>GG</fnm></au><au><snm>Florea</snm><fnm>L</fnm></au><au><snm>Halpern</snm><fnm>AL</fnm></au><au><snm>Mobarry</snm><fnm>CM</fnm></au><au><snm>Lippert</snm><fnm>R</fnm></au><au><snm>Walenz</snm><fnm>B</fnm></au><au><snm>Shatkay</snm><fnm>H</fnm></au><au><snm>Dew</snm><fnm>I</fnm></au><au><snm>Miller</snm><fnm>JR</fnm></au><au><snm>Flanigan</snm><fnm>MJ</fnm></au><au><snm>Edwards</snm><fnm>NJ</fnm></au><au><snm>Bolanos</snm><fnm>R</fnm></au><au><snm>Fasulo</snm><fnm>D</fnm></au><au><snm>Halldorsson</snm><fnm>BV</fnm></au><au><snm>Hannenhalli</snm><fnm>S</fnm></au><au><snm>Turner</snm><fnm>R</fnm></au><au><snm>Yooseph</snm><fnm>S</fnm></au><au><snm>Lu</snm><fnm>F</fnm></au><au><snm>Nusskern</snm><fnm>DR</fnm></au><au><snm>Shue</snm><fnm>BC</fnm></au><au><snm>Zheng</snm><fnm>XH</fnm></au><au><snm>Zhong</snm><fnm>F</fnm></au><au><snm>Delcher</snm><fnm>AL</fnm></au><au><snm>Huson</snm><fnm>DH</fnm></au><au><snm>Kravitz</snm><fnm>SA</fnm></au><au><snm>Mouchard</snm><fnm>L</fnm></au><au><snm>Reinert</snm><fnm>K</fnm></au><au><snm>Remington</snm><fnm>KA</fnm></au><au><snm>Clark</snm><fnm>AG</fnm></au><etal/></aug><source>Proc Natl Acad Sci USA</source><pubdate>2004</pubdate><volume>101</volume><fpage>1916</fpage><lpage>1921</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0307971100</pubid><pubid idtype="pmcid">357027</pubid><pubid idtype="pmpid" link="fulltext">14769938</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies.</p></title><aug><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>MacDonald</snm><fnm>JR</fnm></au><au><snm>Tang</snm><fnm>T</fnm></au><au><snm>Carson</snm><fnm>AR</fnm></au><au><snm>Li</snm><fnm>M</fnm></au><au><snm>Rao</snm><fnm>G</fnm></au><au><snm>Khaja</snm><fnm>R</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au></aug><source>PLoS Genet</source><pubdate>2005</pubdate><volume>1</volume><fpage>e56</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.0010056</pubid><pubid idtype="pmcid">1270012</pubid><pubid idtype="pmpid" link="fulltext">16254605</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>What price personal genome exploration?</p></title><aug><au><snm>Fox</snm><fnm>JL</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2008</pubdate><volume>26</volume><fpage>1105</fpage><lpage>1108</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1008-1105</pubid><pubid idtype="pmpid" link="fulltext">18846082</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>An agenda for personalized medicine.</p></title><aug><au><snm>Ng</snm><fnm>PC</fnm></au><au><snm>Murray</snm><fnm>SS</fnm></au><au><snm>Levy</snm><fnm>S</fnm></au><au><snm>Venter</snm><fnm>JC</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>461</volume><fpage>724</fpage><lpage>726</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/461724a</pubid><pubid idtype="pmpid" link="fulltext">19812653</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Personalized copy number and segmental duplication maps using next-generation sequencing.</p></title><aug><au><snm>Alkan</snm><fnm>C</fnm></au><au><snm>Kidd</snm><fnm>JM</fnm></au><au><snm>Marques-Bonet</snm><fnm>T</fnm></au><au><snm>Aksay</snm><fnm>G</fnm></au><au><snm>Antonacci</snm><fnm>F</fnm></au><au><snm>Hormozdiari</snm><fnm>F</fnm></au><au><snm>Kitzman</snm><fnm>JO</fnm></au><au><snm>Baker</snm><fnm>C</fnm></au><au><snm>Malig</snm><fnm>M</fnm></au><au><snm>Mutlu</snm><fnm>O</fnm></au><au><snm>Sahinalp</snm><fnm>SC</fnm></au><au><snm>Gibbs</snm><fnm>RA</fnm></au><au><snm>Eichler</snm><fnm>EE</fnm></au></aug><source>Nat Genet</source><pubdate>2009</pubdate><volume>41</volume><fpage>1061</fpage><lpage>1067</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.437</pubid><pubid idtype="pmcid">2875196</pubid><pubid idtype="pmpid" link="fulltext">19718026</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>A robust framework for detecting structural variations in a genome.</p></title><aug><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Cheran</snm><fnm>E</fnm></au><au><snm>Brudno</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><fpage>i59</fpage><lpage>67</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn176</pubid><pubid idtype="pmcid">2718654</pubid><pubid idtype="pmpid" link="fulltext">18586745</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.</p></title><aug><au><snm>Chen</snm><fnm>K</fnm></au><au><snm>Wallis</snm><fnm>JW</fnm></au><au><snm>McLellan</snm><fnm>MD</fnm></au><au><snm>Larson</snm><fnm>DE</fnm></au><au><snm>Kalicki</snm><fnm>JM</fnm></au><au><snm>Pohl</snm><fnm>CS</fnm></au><au><snm>McGrath</snm><fnm>SD</fnm></au><au><snm>Wendl</snm><fnm>MC</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Locke</snm><fnm>DP</fnm></au><au><snm>Shi</snm><fnm>X</fnm></au><au><snm>Fulton</snm><fnm>RS</fnm></au><au><snm>Ley</snm><fnm>TJ</fnm></au><au><snm>Wilson</snm><fnm>RK</fnm></au><au><snm>Ding</snm><fnm>L</fnm></au><au><snm>Mardis</snm><fnm>ER</fnm></au></aug><source>Nat Methods</source><pubdate>2009</pubdate><volume>6</volume><fpage>677</fpage><lpage>681</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1363</pubid><pubid idtype="pmpid" link="fulltext">19668202</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library.</p></title><aug><au><snm>Lam</snm><fnm>HY</fnm></au><au><snm>Mu</snm><fnm>XJ</fnm></au><au><snm>Stutz</snm><fnm>AM</fnm></au><au><snm>Tanzer</snm><fnm>A</fnm></au><au><snm>Cayting</snm><fnm>PD</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Kim</snm><fnm>PM</fnm></au><au><snm>Korbel</snm><fnm>JO</fnm></au><au><snm>Gerstein</snm><fnm>MB</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2010</pubdate><volume>28</volume><fpage>47</fpage><lpage>55</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1600</pubid><pubid idtype="pmpid" link="fulltext">20037582</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Relative impact of nucleotide and copy number variation on gene expression phenotypes.</p></title><aug><au><snm>Stranger</snm><fnm>BE</fnm></au><au><snm>Forrest</snm><fnm>MS</fnm></au><au><snm>Dunning</snm><fnm>M</fnm></au><au><snm>Ingle</snm><fnm>CE</fnm></au><au><snm>Beazley</snm><fnm>C</fnm></au><au><snm>Thorne</snm><fnm>N</fnm></au><au><snm>Redon</snm><fnm>R</fnm></au><au><snm>Bird</snm><fnm>CP</fnm></au><au><snm>de Grassi</snm><fnm>A</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Tyler-Smith</snm><fnm>C</fnm></au><au><snm>Carter</snm><fnm>N</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Tavare</snm><fnm>S</fnm></au><au><snm>Deloukas</snm><fnm>P</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au><au><snm>Dermitzakis</snm><fnm>ET</fnm></au></aug><source>Science</source><pubdate>2007</pubdate><volume>315</volume><fpage>848</fpage><lpage>853</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1136678</pubid><pubid idtype="pmcid">2665772</pubid><pubid idtype="pmpid" link="fulltext">17289997</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Genetic variation in an individual human exome.</p></title><aug><au><snm>Ng</snm><fnm>PC</fnm></au><au><snm>Levy</snm><fnm>S</fnm></au><au><snm>Huang</snm><fnm>J</fnm></au><au><snm>Stockwell</snm><fnm>TB</fnm></au><au><snm>Walenz</snm><fnm>BP</fnm></au><au><snm>Li</snm><fnm>K</fnm></au><au><snm>Axelrod</snm><fnm>N</fnm></au><au><snm>Busam</snm><fnm>DA</fnm></au><au><snm>Strausberg</snm><fnm>RL</fnm></au><au><snm>Venter</snm><fnm>JC</fnm></au></aug><source>PLoS Genet</source><pubdate>2008</pubdate><volume>4</volume><fpage>e1000160</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.1000160</pubid><pubid idtype="pmcid">2493042</pubid><pubid idtype="pmpid" link="fulltext">18704161</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Breakpoint mapping and array CGH in translocations: comparison of a phenotypically normal and an abnormal cohort.</p></title><aug><au><snm>Baptista</snm><fnm>J</fnm></au><au><snm>Mercer</snm><fnm>C</fnm></au><au><snm>Prigmore</snm><fnm>E</fnm></au><au><snm>Gribble</snm><fnm>SM</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Maloney</snm><fnm>V</fnm></au><au><snm>Thomas</snm><fnm>NS</fnm></au><au><snm>Jacobs</snm><fnm>PA</fnm></au><au><snm>Crolla</snm><fnm>JA</fnm></au></aug><source>Am J Hum Genet</source><pubdate>2008</pubdate><volume>82</volume><fpage>927</fpage><lpage>936</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ajhg.2008.02.012</pubid><pubid idtype="pmcid">2427237</pubid><pubid idtype="pmpid" link="fulltext">18371933</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Characterization of apparently balanced chromosomal rearrangements from the developmental genome anatomy project.</p></title><aug><au><snm>Higgins</snm><fnm>AW</fnm></au><au><snm>Alkuraya</snm><fnm>FS</fnm></au><au><snm>Bosco</snm><fnm>AF</fnm></au><au><snm>Brown</snm><fnm>KK</fnm></au><au><snm>Bruns</snm><fnm>GA</fnm></au><au><snm>Donovan</snm><fnm>DJ</fnm></au><au><snm>Eisenman</snm><fnm>R</fnm></au><au><snm>Fan</snm><fnm>Y</fnm></au><au><snm>Farra</snm><fnm>CG</fnm></au><au><snm>Ferguson</snm><fnm>HL</fnm></au><au><snm>Gusella</snm><fnm>JF</fnm></au><au><snm>Harris</snm><fnm>DJ</fnm></au><au><snm>Herrick</snm><fnm>SR</fnm></au><au><snm>Kelly</snm><fnm>C</fnm></au><au><snm>Kim</snm><fnm>HG</fnm></au><au><snm>Kishikawa</snm><fnm>S</fnm></au><au><snm>Korf</snm><fnm>BR</fnm></au><au><snm>Kulkarni</snm><fnm>S</fnm></au><au><snm>Lally</snm><fnm>E</fnm></au><au><snm>Leach</snm><fnm>NT</fnm></au><au><snm>Lemyre</snm><fnm>E</fnm></au><au><snm>Lewis</snm><fnm>J</fnm></au><au><snm>Ligon</snm><fnm>AH</fnm></au><au><snm>Lu</snm><fnm>W</fnm></au><au><snm>Maas</snm><fnm>RL</fnm></au><au><snm>MacDonald</snm><fnm>ME</fnm></au><au><snm>Moore</snm><fnm>SD</fnm></au><au><snm>Peters</snm><fnm>RE</fnm></au><au><snm>Quade</snm><fnm>BJ</fnm></au><au><snm>Quintero-Rivera</snm><fnm>F</fnm></au><etal/></aug><source>Am J Hum Genet</source><pubdate>2008</pubdate><volume>82</volume><fpage>712</fpage><lpage>722</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ajhg.2008.01.011</pubid><pubid idtype="pmcid">2427206</pubid><pubid idtype="pmpid" link="fulltext">18319076</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Copy-number variation in control population cohorts.</p></title><aug><au><snm>Pinto</snm><fnm>D</fnm></au><au><snm>Marshall</snm><fnm>C</fnm></au><au><snm>Feuk</snm><fnm>L</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au></aug><source>Hum Mol Genet</source><pubdate>2007</pubdate><volume>16</volume><issue>Spec No 2</issue><fpage>R168</fpage><lpage>173</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/hmg/ddm241</pubid><pubid idtype="pmpid" link="fulltext">17911159</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Personal genomes: The case of the missing heritability.</p></title><aug><au><snm>Maher</snm><fnm>B</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>456</volume><fpage>18</fpage><lpage>21</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/456018a</pubid><pubid idtype="pmpid" link="fulltext">18987709</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>The clinical context of copy number variation in the human genome.</p></title><aug><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au></aug><source>Expert Rev Mol Med</source><pubdate>2010</pubdate><volume>12</volume><fpage>e8</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1017/S1462399410001390</pubid><pubid idtype="pmpid" link="fulltext">20211047</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>The sequence of the human genome.</p></title><aug><au><snm>Venter</snm><fnm>JC</fnm></au><au><snm>Adams</snm><fnm>MD</fnm></au><au><snm>Myers</snm><fnm>EW</fnm></au><au><snm>Li</snm><fnm>PW</fnm></au><au><snm>Mural</snm><fnm>RJ</fnm></au><au><snm>Sutton</snm><fnm>GG</fnm></au><au><snm>Smith</snm><fnm>HO</fnm></au><au><snm>Yandell</snm><fnm>M</fnm></au><au><snm>Evans</snm><fnm>CA</fnm></au><au><snm>Holt</snm><fnm>RA</fnm></au><au><snm>Gocayne</snm><fnm>JD</fnm></au><au><snm>Amanatides</snm><fnm>P</fnm></au><au><snm>Ballew</snm><fnm>RM</fnm></au><au><snm>Huson</snm><fnm>DH</fnm></au><au><snm>Wortman</snm><fnm>JR</fnm></au><au><snm>Zhang</snm><fnm>Q</fnm></au><au><snm>Kodira</snm><fnm>CD</fnm></au><au><snm>Zheng</snm><fnm>XH</fnm></au><au><snm>Chen</snm><fnm>L</fnm></au><au><snm>Skupski</snm><fnm>M</fnm></au><au><snm>Subramanian</snm><fnm>G</fnm></au><au><snm>Thomas</snm><fnm>PD</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Gabor Miklos</snm><fnm>GL</fnm></au><au><snm>Nelson</snm><fnm>C</fnm></au><au><snm>Broder</snm><fnm>S</fnm></au><au><snm>Clark</snm><fnm>AG</fnm></au><au><snm>Nadeau</snm><fnm>J</fnm></au><au><snm>McKusick</snm><fnm>VA</fnm></au><au><snm>Zinder</snm><fnm>N</fnm></au><etal/></aug><source>Science</source><pubdate>2001</pubdate><volume>291</volume><fpage>1304</fpage><lpage>1351</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1058040</pubid><pubid idtype="pmpid" link="fulltext">11181995</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>BLAT--the BLAST-like alignment tool.</p></title><aug><au><snm>Kent</snm><fnm>WJ</fnm></au></aug><source>Genome Res</source><pubdate>2002</pubdate><volume>12</volume><fpage>656</fpage><lpage>664</lpage><xrefbib><pubidlist><pubid idtype="pmcid">187518</pubid><pubid idtype="pmpid" link="fulltext">11932250</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing.</p></title><aug><au><snm>Park</snm><fnm>H</fnm></au><au><snm>Kim</snm><fnm>JI</fnm></au><au><snm>Ju</snm><fnm>YS</fnm></au><au><snm>Gokcumen</snm><fnm>O</fnm></au><au><snm>Mills</snm><fnm>RE</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Lee</snm><fnm>S</fnm></au><au><snm>Suh</snm><fnm>D</fnm></au><au><snm>Hong</snm><fnm>D</fnm></au><au><snm>Kang</snm><fnm>HP</fnm></au><au><snm>Yoo</snm><fnm>YJ</fnm></au><au><snm>Shin</snm><fnm>JY</fnm></au><au><snm>Kim</snm><fnm>HJ</fnm></au><au><snm>Yavartanoo</snm><fnm>M</fnm></au><au><snm>Chang</snm><fnm>YW</fnm></au><au><snm>Ha</snm><fnm>JS</fnm></au><au><snm>Chong</snm><fnm>W</fnm></au><au><snm>Hwang</snm><fnm>GR</fnm></au><au><snm>Darvishi</snm><fnm>K</fnm></au><au><snm>Kim</snm><fnm>H</fnm></au><au><snm>Yang</snm><fnm>SJ</fnm></au><au><snm>Yang</snm><fnm>KS</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au><au><snm>Scherer</snm><fnm>SW</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Tyler-Smith</snm><fnm>C</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Seo</snm><fnm>JS</fnm></au></aug><source>Nat Genet</source><pubdate>2010</pubdate><volume>42</volume><fpage>400</fpage><lpage>405</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.555</pubid><pubid idtype="pmpid" link="fulltext">20364138</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.</p></title><aug><au><snm>Korn</snm><fnm>JM</fnm></au><au><snm>Kuruvilla</snm><fnm>FG</fnm></au><au><snm>McCarroll</snm><fnm>SA</fnm></au><au><snm>Wysoker</snm><fnm>A</fnm></au><au><snm>Nemesh</snm><fnm>J</fnm></au><au><snm>Cawley</snm><fnm>S</fnm></au><au><snm>Hubbell</snm><fnm>E</fnm></au><au><snm>Veitch</snm><fnm>J</fnm></au><au><snm>Collins</snm><fnm>PJ</fnm></au><au><snm>Darvishi</snm><fnm>K</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Nizzari</snm><fnm>MM</fnm></au><au><snm>Gabriel</snm><fnm>SB</fnm></au><au><snm>Purcell</snm><fnm>S</fnm></au><au><snm>Daly</snm><fnm>MJ</fnm></au><au><snm>Altshuler</snm><fnm>D</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>1253</fpage><lpage>1260</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.237</pubid><pubid idtype="pmcid">2756534</pubid><pubid idtype="pmpid" link="fulltext">18776909</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data.</p></title><aug><au><snm>Colella</snm><fnm>S</fnm></au><au><snm>Yau</snm><fnm>C</fnm></au><au><snm>Taylor</snm><fnm>JM</fnm></au><au><snm>Mirza</snm><fnm>G</fnm></au><au><snm>Butler</snm><fnm>H</fnm></au><au><snm>Clouston</snm><fnm>P</fnm></au><au><snm>Bassett</snm><fnm>AS</fnm></au><au><snm>Seller</snm><fnm>A</fnm></au><au><snm>Holmes</snm><fnm>CC</fnm></au><au><snm>Ragoussis</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>2013</fpage><lpage>2025</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm076</pubid><pubid idtype="pmcid">1874617</pubid><pubid idtype="pmpid" link="fulltext">17341461</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Sparse representation and Bayesian detection of genome copy number alterations from microarray data.</p></title><aug><au><snm>Pique-Regi</snm><fnm>R</fnm></au><au><snm>Monso-Varona</snm><fnm>J</fnm></au><au><snm>Ortega</snm><fnm>A</fnm></au><au><snm>Seeger</snm><fnm>RC</fnm></au><au><snm>Triche</snm><fnm>TJ</fnm></au><au><snm>Asgharzadeh</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><fpage>309</fpage><lpage>318</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm601</pubid><pubid idtype="pmcid">2704547</pubid><pubid idtype="pmpid" link="fulltext">18203770</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>