Open Access Highly Accessed Research

Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization

John C Marioni12, Natalie P Thorne12, Armand Valsesia3, Tomas Fitzgerald3, Richard Redon3, Heike Fiegler3, T Daniel Andrews3, Barbara E Stranger3, Andrew G Lynch2, Emmanouil T Dermitzakis3, Nigel P Carter3, Simon Tavaré12 and Matthew E Hurles3*

Author Affiliations

1 Computational Biology Group, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK

2 Computational Biology Group, Department of Oncology, University of Cambridge, Cancer Research UK Cambridge Research Institute, Robinson Way, Cambridge CB2 0RE, UK

3 The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

For all author emails, please log on.

Genome Biology 2007, 8:R228  doi:10.1186/gb-2007-8-10-r228

Published: 25 October 2007

Additional files

Additional data file 1:

Each page of the PDF file corresponds to an individual chromosome (from 2 to 22). On each page the clones on a chromosome are ordered along the x-axis and the HapMap samples (for samples that are not excluded because of the presence of chromosome-wide gains or losses) are plotted on the y-axis. A green/red region on the heatmap indicates that the fitted loess values in this region are consistently greater/less than zero. The samples have been ordered using the Ward agglomeration method and a Euclidean distance metric. The plot across the top of the heatmap indicates the GC content of each probe and the color bar on the right of the heatmap displays the ethnic origin of a sample: blue (YRI), yellow (CEU) and purple (CHB + JPT). The scale along the bottom of each figure gives the location of the cytobands on a chromosome.

Format: PDF Size: 4.2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 2:

Comparison of the performance of loess correction and GC linear regression of the WGTP data.

Format: DOC Size: 60KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 3:

Three panels display data from two replicate experiments: (A) Fitted loess curves for chromosome 1 (Cy3/Cy5) using dye-labeled nucleotides dCTP (purple) and dUTP (orange); (B) fitted loess curves for chromosome 1 (Cy3/Cy5) in a dye-swap experiment using dye-labeled nucleotides dCTP (purple) and dUTP (orange); (C) smoothed scatterplot of all autosomal dCTP (y-axis) vs dUTP (x-axis) loess fits for the first experiment (Cy3/Cy5); the red line is the regression line fitted to the data, which does not show a negative slope, indicating that changing the dye-labeled nucleotide does not invert the wave effect.

Format: PDF Size: 1.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 4:

The top dendrogram (a) shows the clustering of the uncorrected log2 ratios for the unrelated HapMap samples for all 22 autosomal chromosomes. The heatbar under the dendrogram indicates the ethnic origin of the sample (blue, YRI; yellow, CEU; purple, CHB + JPT). The second dendrogram/heatbar (b) shows the clustering of the corrected log2 ratios on the same chromosomes.

Format: PDF Size: 69KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 5:

The columns of the table give the number of clones on chromosome 4 with log2 ratios outside a threshold of ±0.06 for the three samples (NA11829, NA12044 and NA19093) shown in Figure 5. The rows indicate the number of clones that are identified using this threshold for uncorrected and corrected log2 ratios. A threshold of ±0.06 is necessary in order to identify the red clone that represents a genuine CNV in Figure 5 (log2 ratios of 0.071, -0.0665, 0.0835 prior to wave correction, 0.064, -0.069, 0.085 after wave correction).

Format: PDF Size: 25KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 6:

The top plot shows the sensitivity (with confidence intervals) for each of the five replicated validation experiments. The experiments are labeled from A to E in order of increasing standard deviation. Lines/points in black represent the sensitivity calculated by CNVfinder on the uncorrected data, lines/points in red represent the sensitivity calculated when CNVfinder was applied to the corrected data and blue lines/points represent the sensitivity calculated when CNVmix was applied to the corrected data. The middle and lower plots show the specificity (with confidence intervals) and FDR, respectively, for the same experiments. The annotation and color scheme is the same as described above.

Format: PDF Size: 20KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 7:

The ID and genomic location of clones that are called as CNVs.

Format: TXT Size: 69KB Download file

Open Data

Additional data file 8:

Details about whether a sample is flagged as a CNV for each of the called clones (-1 = deletion, 1 = gain, 2 = complex and 0 = normal).

Format: TXT Size: 1.1MB Download file

Open Data

Additional data file 9:

Each page of the PDF file corresponds to an individual chromosome. On each page the clones on a chromosome are ordered along the x-axis and the 95 samples that were investigated for CNV in [15] are plotted on the y-axis. (Note that we could not obtain mapping information for 1% of the clones and so they were removed from our analysis.) A green/red region on the heatmap indicates that the fitted loess values in this region are consistently greater/less than zero. The samples have been ordered using the Ward agglomeration method and a Euclidean distance metric. The scale along the bottom of each figure gives the location of the cytobands on a chromosome. Note that some of the heatmaps are predominantly red (noticeably chromosomes 19 and 22) - this is because the median log2 ratio is consistently less than 0 for these chromosomes.

Format: PDF Size: 4.2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 10:

Each page of the PDF contains a plot of the log2 ratios for clones (censored at ±0.3) on the long arm of chromosome 7 for five samples analyzed in [15] (samples S20, S30, S32, S40 and S60). The fitted loess curve for this genomic region has been overlaid in blue and the thresholds used in [15] to identify clones harboring CNVs are shown by horizontal dashed gray lines. On all five plots we can observe that the fitted loess curve has a trough at around 75 and 100 Mb (this is common to all samples - see Additional data file 6), suggesting that this is a technical artifact. Moreover, in all five plots a small number of clones in these regions have log2 ratios that are lower than the threshold and, consequently, they are flagged (almost certainly incorrectly) as harboring a CNV.

Format: PDF Size: 337KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 11:

Details of the CNVmix mixture model.

Format: DOC Size: 54KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data