Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
.refereed research
 |  |  |  |  | 


Open AccessHighly AccessResearch

Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization

John C Marioni1,2 email, Natalie P Thorne1,2 email, Armand Valsesia3 email, Tomas Fitzgerald3 email, Richard Redon3 email, Heike Fiegler3 email, T Daniel Andrews3 email, Barbara E Stranger3 email, Andrew G Lynch2 email, Emmanouil T Dermitzakis3 email, Nigel P Carter3 email, Simon Tavaré1,2 email and Matthew E Hurles3 email

1Computational Biology Group, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK

2Computational Biology Group, Department of Oncology, University of Cambridge, Cancer Research UK Cambridge Research Institute, Robinson Way, Cambridge CB2 0RE, UK

3The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

author email corresponding author email

Genome Biology 2007, 8:R228doi:10.1186/gb-2007-8-10-r228

Published: 25 October 2007

Subject areas: Bioinformatics, Genome studies

Abstract

Background

Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.

Results

We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.

Conclusion

Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.


© 1999-2008 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.