Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Highly Accessed Software

An integrated computational pipeline and database to support whole-genome sequence annotation

CJ Mungall1*, S Misra23, BP Berman2, J Carlson4, E Frise4, N Harris34, B Marshall2, S Shu23, JS Kaminker23, SE Prochnik23, CD Smith23, E Smith23, JL Tupy23, C Wiel23, GM Rubin1234 and SE Lewis23

Author Affiliations

1 Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA

2 Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA

3 FlyBase-Berkeley, University of California, Berkeley, CA 94720-3200, USA

4 Genome Sciences Department, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

For all author emails, please log on.

Genome Biology 2002, 3:research0081-0081.11  doi:10.1186/gb-2002-3-12-research0081

This article is part of a series of refereed research articles from Berkeley Drosophila Genome Project, FlyBase and colleagues, describing Release 3 of the Drosophila genome, which are freely available at

Published: 23 December 2002


We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.