Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
.refereed research
 |  |  |  |  | 


Open AccessResearch

Analysis of 14 BAC sequences from the Aedes aegypti genome: a benchmark for genome annotation and assembly

Neil F Lobo1* email, Kathy S Campbell2* email, Daniel Thaner1 email, Becky deBruyn1 email, Hean Koo3 email, William M Gelbart2 email, Brendan J Loftus3 email, David W Severson1 email and Frank H Collins1 email

Center for Global Health and Infectious Diseases, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-0369, USA

Harvard University, Cambridge, MA 02138, USA

TIGR, Rockville, MD, 20850, USA

author email corresponding author email* Contributed equally

Genome Biology 2007, 8:R88doi:10.1186/gb-2007-8-5-r88

Published: 22 May 2007

Subject areas: Genome studies, Model organisms, Medicine

Abstract

Background

Aedes aegypti is the principal vector of yellow fever and dengue viruses throughout the tropical world. To provide a set of manually curated and annotated sequences from the Ae. aegypti genome, 14 mapped bacterial artificial chromosome (BAC) clones encompassing 1.57 Mb were sequenced, assembled and manually annotated using a combination of computational gene-finding, expressed sequence tag (EST) matches and comparative protein homology. PCR and sequencing were used to experimentally confirm expression and sequence of a subset of these transcripts.

Results

Of the 51 manual annotations, 50 and 43 demonstrated a high level of similarity to Anopheles gambiae and Drosophila melanogaster genes, respectively. Ten of the 12 BAC sequences with more than one annotated gene exhibited synteny with the A. gambiae genome. Putative transcripts from eight BAC clones were found in multiple copies (two copies in most cases) in the Aedes genome assembly, which point to the probable presence of haplotype polymorphisms and/or misassemblies.

Conclusion

This study not only provides a benchmark set of manually annotated transcripts for this genome that can be used to assess the quality of the auto-annotation pipeline and the assembly, but it also looks at the effect of a high repeat content on the genome assembly and annotation pipeline.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.