Log on / register
BioMed Central home | Journals A-Z | Feedback | Support
.refereed research
 |  |  |  |  | 


Open AccessHighly AccessResearch

Annotation of the Drosophila melanogaster euchromatic genome: a systematic review

Sima Misra1,2 email, Madeline A Crosby3, Christopher J Mungall2,4, Beverley B Matthews3, Kathryn S Campbell3, Pavel Hradecky3, Yanmei Huang3, Joshua S Kaminker1,2, Gillian H Millburn5, Simon E Prochnik1,2, Christopher D Smith1,2, Jonathan L Tupy1,2, Eleanor J Whitfield6, Leyla Bayraktaroglu3, Benjamin P Berman1, Brian R Bettencourt3, Susan E Celniker7, Aubrey DNJ de Grey5, Rachel A Drysdale5, Nomi L Harris2,7, John Richter4, Susan Russo3, Andrew J Schroeder3, ShengQiang Shu1,2, Mark Stapleton7, Chihiro Yamada5, Michael Ashburner5, William M Gelbart3, Gerald M Rubin1,2,4,7 and Suzanna E Lewis1,2

1Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA

2FlyBase-Berkeley, University of California, Berkeley, CA 94720-3200, USA

3FlyBase-Harvard, Department of Molecular and Cell Biology, Harvard University, Biological Laboratories, 16 Divinity Avenue, Cambridge, MA 02138-2020, USA

4Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA

5FlyBase-Cambridge, Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK

6EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

7Department of Genome Sciences, Lawrence Berkeley National Laboratory, One Cyclotron Road Mailstop 64-121, Berkeley, CA 94720, USA

author email corresponding author email

Genome Biology 2002, 3:research0083.1-0083.22doi:10.1186/gb-2002-3-12-research0083

Published: 31 December 2002


This article is part of a series of refereed research articles from Berkeley Drosophila Genome Project, FlyBase and colleagues, describing Release 3 of the Drosophila genome, which are freely available at http://genomebiology.com/drosophila/.

Subject areas: Genome studies, Model organisms, Bioinformatics

Abstract

Background

The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.

Results

Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.

Conclusions

Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.


© 1999-2008 BioMed Central Ltd unless otherwise stated < info@genomebiology.com >   Terms and conditions