Open Access Research

Update of the Anopheles gambiae PEST genome assembly

Maria V Sharakhova12, Martin P Hammond3, Neil F Lobo1, Jaroslaw Krzywinski14, Maria F Unger1, Maureen E Hillenmeyer15, Robert V Bruggner1, Ewan Birney2 and Frank H Collins1*

Author affiliations

1 Center for Global Health and Infectious Diseases, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN 46556-0369, USA

2 Department of Entomology, College of Agriculture and Life Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061-0319, USA

3 European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

4 Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA

5 School of Medicine - IDP - Biomedical Informatics, Stanford University, Stanford, CA 94305, USA

For all author emails, please log on.

Citation and License

Genome Biology 2007, 8:R5  doi:10.1186/gb-2007-8-1-r5

Published: 8 January 2007

Abstract

Background

The genome of Anopheles gambiae, the major vector of malaria, was sequenced and assembled in 2002. This initial genome assembly and analysis made available to the scientific community was complicated by the presence of assembly issues, such as scaffolds with no chromosomal location, no sequence data for the Y chromosome, haplotype polymorphisms resulting in two different genome assemblies in limited regions and contaminating bacterial DNA.

Results

Polytene chromosome in situ hybridization with cDNA clones was used to place 15 unmapped scaffolds (sizes totaling 5.34 Mbp) in the pericentromeric regions of the chromosomes and oriented a further 9 scaffolds. Additional analysis by in situ hybridization of bacterial artificial chromosome (BAC) clones placed 1.32 Mbp (5 scaffolds) in the physical gaps between scaffolds on euchromatic parts of the chromosomes. The Y chromosome sequence information (0.18 Mbp) remains highly incomplete and fragmented among 55 short scaffolds. Analysis of BAC end sequences showed that 22 inter-scaffold gaps were spanned by BAC clones. Unmapped scaffolds were also aligned to the chromosome assemblies in silico, identifying regions totaling 8.18 Mbp (144 scaffolds) that are probably represented in the genome project by two alternative assemblies. An additional 3.53 Mbp of alternative assembly was identified within mapped scaffolds. Scaffolds comprising 1.97 Mbp (679 small scaffolds) were identified as probably derived from contaminating bacterial DNA. In total, about 33% of previously unmapped sequences were placed on the chromosomes.

Conclusion

This study has used new approaches to improve the physical map and assembly of the A. gambiae genome.