Open Access Highly Accessed Research

Genomic characterization of the Yersinia genus

Peter E Chen1, Christopher Cook1, Andrew C Stewart1, Niranjan Nagarajan27, Dan D Sommer2, Mihai Pop2, Brendan Thomason1, Maureen P Kiley Thomason1, Shannon Lentz1, Nichole Nolan1, Shanmuga Sozhamannan1, Alexander Sulakvelidze3, Alfred Mateczun1, Lei Du4, Michael E Zwick15 and Timothy D Read156*

  • * Corresponding author: Timothy D Read tread@emory.edu

  • † Equal contributors

Author Affiliations

1 Biological Defense Research Directorate, Naval Medical Research Center, 503 Robert Grant Avenue, Silver Spring, Maryland 20910, USA

2 University of Maryland Institute for Advanced Computer Sciences, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA

3 Emerging Pathogens Institute and Department of Molecular Genetics and Microbiology, University of Florida College of Medicine, Gainesville, Florida 32610, USA

4 454 Life Sciences Inc., 15 Commercial Street, Branford, Connecticut 06405, USA

5 Department of Human Genetics, Emory University School of Medicine, 615 Michael Street, Atlanta, Georgia 30322, USA

6 Division of Infectious Diseases, Emory University School of Medicine, 615 Michael Street, Atlanta, Georgia 30322, USA

7 Current address: Computational and Mathematical Biology, Genome Institute of Singapore, Singapore-127726

For all author emails, please log on.

Genome Biology 2010, 11:R1  doi:10.1186/gb-2010-11-1-r1

Published: 4 January 2010

Additional files

Additional file 1:

Statistics from running DIYA [89] and frameshift detection programs on the eight genomes sequenced in this study and various other enterobacterial genomes downloaded from NCBI.

Format: XLS Size: 39KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Results of amosvalidate [37] analysis on the eight genomes of this study.

Format: DOC Size: 29KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 3:

These consist of ISfinder [40], RepeatScout [41]and amosvalidate [37] results (GFF format); repeats found by RepeatScout in fasta format, scaffold files (NCBI AGP format); and information about length of contigs, read count, estimated repeat number, count in scaffold and whether or not the contig was placed by SOMA [39].

Format: GZ Size: 278KB Download file

Open Data

Additional file 4:

Estimates for genome sizes (in Mbp) based on optical map data.

Format: DOC Size: 39KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 5:

An E. coli strain with known plasmids was a positive control.

Format: DOC Size: 628KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 6:

Sequences of the detected repeat families.

Format: TXT Size: 83KB Download file

Open Data

Additional file 7:

Y. pestis CO92 signatures longer than 100 bp computed by the Insignia [44] pipeline.

Format: TXT Size: 25KB Download file

Open Data

Additional file 8:

Sequences of the new genomes that match (that is, invalidate) the Y. pestis CO92 signatures listed in Additional file 7.

Format: TXT Size: 1KB Download file

Open Data

Additional file 9:

Y. enterocolitica signatures longer than 100 bp computed by the Insignia pipeline.

Format: TXT Size: 13KB Download file

Open Data

Additional file 10:

Sequences of the new genomes that match (that is, invalidate) the Y. enterocolitica signatures.

Format: TXT Size: 2KB Download file

Open Data

Additional file 11:

Y. pestis genome with the Insiginia-indentified repeats and genome islands identified using IslandViewer [45] plotted. The figure was created using DNAPlotter [106].

Format: PNG Size: 67KB Download file

Open Data

Additional file 12:

Y. enterocolitica genome with the Insiginia-indentified repeats and genome islands identified using IslandViewer [45] plotted. The figure was created using DNAPlotter [106].

Format: PNG Size: 71KB Download file

Open Data

Additional file 13:

The eight genomes sequenced in this study are represented as pseudocontigs, ordered by a combination of optical mapping and alignment to the closest completed reference genome.

Format: JPEG Size: 2.2MB Download file

Open Data

Additional file 14:

Whole genome multiple alignment produced by MAUVE of the 11 Yersinia genomes in XMFA format [106].

Format: ZIP Size: 15.1MB Download file

Open Data

Additional file 15:

The top level directory consists of a directory called Additional_cluster_files and 5010 directories, one for each multi-protein cluster family. (This top level directory has been split into three data files for uploading purposes (Additional files 15, 16, 17).) Within the directory are the following files: PGL1_unique_Yersinia_unclustered.out - list of all protein singletons that MCL did not group into a cluster (see Materials and Methods); PGL1_Yersinia_unique_locus_tags.txt - names of the 11 locus tag prefixes used for each genome; PGL1_unique_Yersinia.gff - mapping each Yersinia protein to a cluster in tab delimited GFF; PGL1_unique_Yersinia.sigfile - list of the longest protein in each cluster; PGL1_unique_Yersinia.summary - summary table of features of each of the clusters; PGL1_unique_Yersinia.table - summary table of each protein in the clusters. Within each cluster directory are the following files, where 'x' is the cluster name: PGL1_unique_Yersinia-x.faa - multifasta file of the proteins in the cluster; PGL1_unique_Yersinia-x.summary - summary of the properties of the proteins; PGL1_unique_Yersinia-x.matches - blast matches between the proteins of the cluster; PGL1_unique_Yersinia-x.muscle.fasta - muscle alignment of the proteins; PGL1_unique_Yersinia-x.muscle.fasta.gblo - gblocks output of muscle alignment (that is, auto-trimmed alignment); PGL1_unique_Yersinia-x.muscle.fasta.gblo.htm - as above in html format; PGL1_unique_Yersinia-x.muscle.tree - treefile from muscle alignment; PGL1_unique_Yersinia-x.sif - matches between proteins in simple interaction format for display on graphing software.

Format: ZIP Size: 18.2MB Download file

Open Data

Additional file 16:

The top level directory consists of a directory called Additional_cluster_files and 5010 directories, one for each multi-protein cluster family. (This top level directory has been split into three data files for uploading purposes (Additional files 15, 16, 17.) Within the directory are the following files: PGL1_unique_Yersinia_unclustered.out - list of all protein singletons that MCL did not group into a cluster (see Materials and Methods); PGL1_Yersinia_unique_locus_tags.txt - names of the 11 locus tag prefixes used for each genome; PGL1_unique_Yersinia.gff - mapping each Yersinia protein to a cluster in tab delimited GFF; PGL1_unique_Yersinia.sigfile - list of the longest protein in each cluster; PGL1_unique_Yersinia.summary - summary table of features of each of the clusters; PGL1_unique_Yersinia.table - summary table of each protein in the clusters. Within each cluster directory are the following files, where 'x' is the cluster name: PGL1_unique_Yersinia-x.faa - multifasta file of the proteins in the cluster; PGL1_unique_Yersinia-x.summary - summary of the properties of the proteins; PGL1_unique_Yersinia-x.matches - blast matches between the proteins of the cluster; PGL1_unique_Yersinia-x.muscle.fasta - muscle alignment of the proteins; PGL1_unique_Yersinia-x.muscle.fasta.gblo - gblocks output of muscle alignment (that is, auto-trimmed alignment); PGL1_unique_Yersinia-x.muscle.fasta.gblo.htm - as above in html format; PGL1_unique_Yersinia-x.muscle.tree - treefile from muscle alignment; PGL1_unique_Yersinia-x.sif - matches between proteins in simple interaction format for display on graphing software.

Format: ZIP Size: 14.8MB Download file

Open Data

Additional file 17:

The top level directory consists of a directory called Additional_cluster_files and 5010 directories, one for each multi-protein cluster family. (This top level directory has been split into three data files for uploading purposes (Additional files 15, 16, 17.) Within the directory are the following files: PGL1_unique_Yersinia_unclustered.out - list of all protein singletons that MCL did not group into a cluster (see Materials and Methods); PGL1_Yersinia_unique_locus_tags.txt - names of the 11 locus tag prefixes used for each genome; PGL1_unique_Yersinia.gff - mapping each Yersinia protein to a cluster in tab delimited GFF; PGL1_unique_Yersinia.sigfile - list of the longest protein in each cluster; PGL1_unique_Yersinia.summary - summary table of features of each of the clusters; PGL1_unique_Yersinia.table - summary table of each protein in the clusters. Within each cluster directory are the following files, where 'x' is the cluster name: PGL1_unique_Yersinia-x.faa - multifasta file of the proteins in the cluster; PGL1_unique_Yersinia-x.summary - summary of the properties of the proteins; PGL1_unique_Yersinia-x.matches - blast matches between the proteins of the cluster; PGL1_unique_Yersinia-x.muscle.fasta - muscle alignment of the proteins; PGL1_unique_Yersinia-x.muscle.fasta.gblo - gblocks output of muscle alignment (that is, auto-trimmed alignment); PGL1_unique_Yersinia-x.muscle.fasta.gblo.htm - as above in html format; PGL1_unique_Yersinia-x.muscle.tree - treefile from muscle alignment; PGL1_unique_Yersinia-x.sif - matches between proteins in simple interaction format for display on graphing software.

Format: ZIP Size: 10.4MB Download file

Open Data

Additional file 18:

Complete protein sets for the 11 species of Yersinia.

Format: ZIP Size: 13.5MB Download file

Open Data

Additional file 19:

To evaluate node support, a majority rule-consensus tree of 1,000 bootstrap replicates was computed. E. coli was used as an outgroup species.

Format: PDF Size: 96KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 20:

To evaluate node support, a majority rule-consensus tree of 1,000 bootstrap replicates was computed. E. coli was used as an outgroup species.

Format: PDF Size: 95KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 21:

A curve showing the rate of decline in number of this set as more non-pathogen genomes are added is also included.

Format: DOC Size: 144KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 22:

Phylogeny of TTSS component YscN in Yersinia and other enterobacteria species.

Format: DOC Size: 247KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 23:

Putative antibiotic resistance genes in the Yersinia genus determined using the Antibiotic Resistance Genes Database [45].

Format: XLS Size: 23KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 24:

Calculations for the estimation of Π from aligned Yersinia core genomes.

Format: DOC Size: 23KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data