Review
Identifying protein-coding genes in genomic sequences
1 Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK
2 Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, H-1113 Budapest, Hungary
3 Center for Integrative Genomics, Genopode Building, University of Lausanne, CH-1015 Lausanne, Switzerland
4 Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, E-08003 Barcelona, Catalonia, Spain
5 Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva 1211, Switzerland
Genome Biology 2009, 10:201 doi:10.1186/gb-2009-10-1-201
Published: 30 January 2009Abstract
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.



