Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Research

Characterizing linkage disequilibrium and evaluating imputation power of human genomic insertion-deletion polymorphisms

James T Lu12, Yi Wang1, Richard A Gibbs13 and Fuli Yu13*

Author Affiliations

1 Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA

2 Department of Structural and Computational Biology, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA

3 Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA

For all author emails, please log on.

Genome Biology 2012, 13:R15  doi:10.1186/gb-2012-13-2-r15

Published: 29 February 2012

Abstract

Background

Indels are an important cause of human variation and central to the study of human disease. The 1000 Genomes Project Low-Coverage Pilot identified over 1.3 million indels shorter than 50 bp, of which over 890 were identified as potentially disruptive variants. Yet, despite their ubiquity, the local genomic characteristics of indels remain unexplored.

Results

Herein we describe population- and minor allele frequency-based differences in linkage disequilibrium and imputation characteristics for indels included in the 1000 Genomes Project Low-Coverage Pilot for the CEU, YRI and CHB+JPT populations. Common indels were well tagged by nearby SNPs in all studied populations, and were also tagged at a similar rate to common SNPs. Both neutral and functionally deleterious common indels were imputed with greater than 95% concordance from HapMap Phase 3 and OMNI SNP sites. Further, 38 to 56% of low frequency indels were tagged by low frequency SNPs. We were able to impute heterozygous low frequency indels with over 50% concordance. Lastly, our analysis also revealed evidence of ascertainment bias. This bias prevents us from extending the applicability of our results to highly polymorphic indels that could not be identified in the Low-Coverage Pilot.

Conclusions

Although further scope exists to improve the imputation of low frequency indels, our study demonstrates that there are already ample opportunities to retrospectively impute indels for prior genome-wide association studies and to incorporate indel imputation into future case/control studies.