|
Genome Biology Volume 7 Issue 3 |
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
ResearchAnopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithmsJun Li1 , Michelle M Riehle1 , Yan Zhang1 , Jiannong Xu1 , Frederick Oduol1 , Shawn M Gomez2 , Karin Eiglmeier2 , Beatrix M Ueberheide3 , Jeffrey Shabanowitz3 , Donald F Hunt3 , José MC Ribeiro4 and Kenneth D Vernick1  1Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA 2Unité de Biochimie et Biologie Moléculaire des Insectes and CNRS FRE 2849, Institut Pasteur, 75724 Paris Cedex 15, France 3Department of Chemistry, McCormick Rd, University of Virginia, Charlottesville, VA 22904, USA 4Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, USA author email corresponding author email
Genome Biology 2006,
7:R24doi:10.1186/gb-2006-7-3-r24
Subject areas: Bioinformatics, Genome studies, Model organisms Abstract
Background
Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector.
Results
We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download.
Conclusion
Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms. |