Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
.refereed research
 |  |  |  |  | 


Open AccessResearch

Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?

Caroline Deshayes1,2 email, Emmanuel Perrodou3 email, Sebastien Gallien4 email, Daniel Euphrasie1 email, Christine Schaeffer4 email, Alain Van-Dorsselaer4 email, Olivier Poch3 email, Odile Lecompte3 email and Jean-Marc Reyrat1,2 email

Université Paris Descartes, Faculté de Médecine René Descartes, Paris Cedex 15, F-75730, France

Inserm, U570, Unité de Pathogénie des Infections Systémiques-Groupe AVENIR, Paris Cedex 15, F-75730, France

Laboratoire de Biologie et Génomique Structurales, IGBMC CNRS/INSERM/ULP, BP 163, 67404 Illkirch Cedex, France

Laboratoire de Spectrométrie de Masse Bio-Organique, UMR7178, ECPM, rue Becquerel, Strasbourg, F-67087 cedex 2, France

author email corresponding author email

Genome Biology 2007, 8:R20doi:10.1186/gb-2007-8-2-r20

Published: 12 February 2007

Subject areas: Genome studies, Microbiology and parasitology, Evolution

Abstract

Background

In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects.

Results

We show here, using Mycobacterium smegmatis as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors.

Conclusion

The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.