Genome Biology

official impact factor 6.89

Open Access Highly Access Software

An integrated computational pipeline and database to support whole-genome sequence annotation

CJ Mungall1*, S Misra3,2, BP Berman2, J Carlson4, E Frise4, N Harris3,4, B Marshall2, S Shu3,2, JS Kaminker3,2, SE Prochnik3,2, CD Smith3,2, E Smith3,2, JL Tupy3,2, C Wiel3,2, GM Rubin3,4,1,2 and SE Lewis3,2

Author Affiliations

1 Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA

2 Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA

3 FlyBase-Berkeley, University of California, Berkeley, CA 94720-3200, USA

4 Genome Sciences Department, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

For all author emails, please log on.

Genome Biology 2002, 3:research0081-0081.11 doi:10.1186/gb-2002-3-12-research0081

Published: 23 December 2002

Abstract

We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.