Open Access Highly Accessed Research

Towards a comprehensive structural variation map of an individual human genome

Andy W Pang12, Jeffrey R MacDonald2, Dalila Pinto2, John Wei2, Muhammad A Rafiq2, Donald F Conrad3, Hansoo Park4, Matthew E Hurles3, Charles Lee4, J Craig Venter5, Ewen F Kirkness5, Samuel Levy5, Lars Feuk26* and Stephen W Scherer12*

Author Affiliations

1 Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, Ontario M5S 1A8, Canada

2 The Centre for Applied Genomics, The Hospital for Sick Children, 101 College Street, Toronto, Ontario M5G 1L7, Canada

3 Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

4 Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 221 Longwood Avenue, Boston, Massachusetts 02115, USA

5 J Craig Venter Institute, 9740 Medical Center Drive, Rockville, Maryland 20850, USA

6 Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala 75185, Sweden

For all author emails, please log on.

Genome Biology 2010, 11:R52  doi:10.1186/gb-2010-11-5-r52

Published: 19 May 2010

Additional files

Additional file 1:

Genetic variation in sequenced genomes.

Format: XLS Size: 25KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Clone library information.

Format: XLS Size: 26KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Mate-pair variants and comparison with various data sets.

Format: XLS Size: 2MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Split-read variants and comparison with various data sets.

Format: XLS Size: 15.8MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Agilent 24 M variants and comparison with various data sets.

Format: XLS Size: 413KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

NimbleGen 42 M variants and comparison with various data sets.

Format: XLS Size: 590KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Affymetrix 6.0 variants and comparison with various data sets.

Format: XLS Size: 46KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Illumina 1 M variants and comparison with various data sets.

Format: XLS Size: 24KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Custom Agilent 244 K copy number variants.

Format: XLS Size: 102KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Custom Agilent 244 K copy number variable-scaffolds anchoring information.

Format: XLS Size: 35KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 11:

Example of a PCR-validated insertion event with size 84 bp predicted by the split-read approach. A pair of primers, separated by 497 bp was designed surrounding the insertion site. PCR was run with these primers, and the presence of the insertion was resolved by gel electrophoresis. Starting from the right, DNA from five European controls, DNA from Venter and a negative control were added in lanes 1 to 5, lane 6 and lane 7, respectively.

Format: TIFF Size: 1.5MB Download file

Open Data

Additional file 12:

List of validated variants and their primers and probes.

Format: XLS Size: 43KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

Example of a qPCR-validated gain in Venter relative to sample NA10851 as detected by the custom Agilent 244 K aCGH. A 4.2-kb CNV was detected on the Celera scaffold GA_x5YUVVTY6, and by qPCR, we found that NA10851 had a heterozygous loss in that region, thus confirming a relative gain in Venter.

Format: PDF Size: 2.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 14:

A common inversion on 16p12.2 validated by FISH. (a) A 2-Mb website schematic of the region. This 1.1-Mb inversion was detected by the mate-pair method in Venter as seen in track 'B_Clone'. The track 'Inversions' shows that this inversion was annotated in three other studies [15,17,18]. (b) An image of a four-color FISH experiment revealing that Venter is homozygous for the 16p12.2 inverted allele. Four differentially labeled fosmid probes were scored in >100 interphase FISH experiments and the order of the probes in Venter were found in the vast majority of experiments (including in seven HapMap controls from four different populations) to be in the yellow-green-blue-pink order. In the absence of the inversion, the order of the probes would be yellow-blue-green-pink as depicted in the assembly schematic. Therefore, as discussed in the main text our data suggest that the NCBI build 36 reference represents a rare allele, or may be incorrect.

Format: TIFF Size: 2.6MB Download file

Open Data

Additional file 15:

Comparative analysis of variants discovered in Levy et al. [1] and the current study. The two graphs illustrate the proportion of SVs identified by the assembly comparison method, by our present combined multi-approach strategy (including mate-pair, split-read, CGH arrays and SNP arrays), and the proportion confirmed by both. The x-axis represents size range, while the numbers at the top indicate the total number of calls in a particular size range. As size increases, the number of variants called by assembly comparison decreases significantly, so this indicates that the method has limited sensitivity in detecting large calls. In contrast, our combined multi-approach strategy in the current study is more suitable in finding large variation. (a) Size distribution of gains. (b) Size distribution of losses.

Format: PDF Size: 550KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 16:

Cumulative distribution of probe coverage. (a) Agilent 24 M array probe coverage across NimbleGen 24 M variants. The x-axis begins at 5 - the minimum requirement to call variants on the Agilent array. Hence, the majority of the unconfirmed NimbleGen variants (approximately 70%) were targeted less than five Agilent probes. (b) NimbleGen 42 M array probe coverage across Agilent 24 M variants. The x-axis begins at 10, which is the required number of probes for the NimbleGen array to make a call.

Format: PDF Size: 580KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 17:

A summary list of structural variants overlap with genomic features.

Format: XLS Size: 60KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 18:

Genome-wide distribution of large SVs in Venter. The sites of 2,772 SVs whose position spans >1 kb are shown. Red bars represent insertion or duplication, blue bars represent deletions, and green bars represent inversions.

Format: PDF Size: 504KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 19:

A non-redundant set of Venter insertions and duplications.

Format: ZIP Size: 8.3MB Download file

Open Data

Additional file 20:

A non-redundant set of Venter deletions.

Format: ZIP Size: 9.6MB Download file

Open Data

Additional file 21:

A non-redundant set of Venter inversions.

Format: XLSX Size: 19KB Download file

Open Data

Additional file 22:

List of Venter gains that overlap with exons of RefSeq genes.

Format: XLS Size: 457KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 23:

List of Venter losses that overlap with exons of RefSeq genes.

Format: XLS Size: 439KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 24:

List of Venter gains that overlap with exons of OMIM genes.

Format: XLS Size: 60KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 25:

List of Venter losses that overlap with exons of OMIM genes.

Format: XLS Size: 63KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 26:

A detailed list of genes that are completely encompassed with non-redundant gains and losses.

Format: XLS Size: 32KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 27:

Comparison of Venter SVs with population-based genotyped and SNP-imputable CNVs.

Format: XLS Size: 92KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data