Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

This article is part of the supplement: EGASP '05: ENCODE Genome Annotation Assessment Project

Review

EGASP: the human ENCODE Genome Annotation Assessment Project

Roderic Guigó111*, Paul Flicek2, Josep F Abril1, Alexandre Reymond3, Julien Lagarde1, France Denoeud1, Stylianos Antonarakis4, Michael Ashburner125, Vladimir B Bajic126, Ewan Birney112, Robert Castelo1, Eduardo Eyras1, Catherine Ucla4, Thomas R Gingeras127, Jennifer Harrow118, Tim Hubbard118, Suzanna E Lewis129 and Martin G Reese1012*

Author Affiliations

1 Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain

2 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

3 Center for Integrative Genomics, University of Lausanne, Switzerland

4 University of Geneva Medical School and University Hospitals of Geneva, 1211 Geneva, Switzerland

5 Department of Genetics, University of Cambridge, Cambridge CB3 2EH, UK

6 South African National Bioinformatics Institute (SANBI), University of Western Cape, Bellville 7535, South Africa

7 Affymetrix Inc., Santa Clara, California 95051, USA

8 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

9 Department of Molecular and Cellular Biology, University of California, Berkeley, California 94792, USA

10 Omicia Inc., Christie Ave., Emeryville, California 94608, USA

11 Member of the EGASP Organizing Committee

12 Member of the EGASP Advisory Board

For all author emails, please log on.

Genome Biology 2006, 7(Suppl 1):S2  doi:10.1186/gb-2006-7-s1-s2

Published: 7 August 2006

Abstract

Background

We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment.

Results

The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.

Conclusion

This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.