Open Access Highly Accessed Software

An integrated computational pipeline and database to support whole-genome sequence annotation

CJ Mungall1*, S Misra23, BP Berman2, J Carlson4, E Frise4, N Harris34, B Marshall2, S Shu23, JS Kaminker23, SE Prochnik23, CD Smith23, E Smith23, JL Tupy23, C Wiel23, GM Rubin1234 and SE Lewis23

Author Affiliations

1 Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA

2 Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA

3 FlyBase-Berkeley, University of California, Berkeley, CA 94720-3200, USA

4 Genome Sciences Department, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

For all author emails, please log on.

Genome Biology 2002, 3:research0081-0081.11  doi:10.1186/gb-2002-3-12-research0081


This article is part of a series of refereed research articles from Berkeley Drosophila Genome Project, FlyBase and colleagues, describing Release 3 of the Drosophila genome, which are freely available at http://genomebiology.com/drosophila/.

Published: 23 December 2002

Abstract

We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.