Open Access Highly Accessed Research

Comparative genomics of the pathogenic ciliate Ichthyophthirius multifiliis, its free-living relatives and a host species provide insights into adoption of a parasitic lifestyle and prospects for disease control

Robert S Coyne1*, Linda Hannick2, Dhanasekaran Shanmugam3, Jessica B Hostetler4, Daniel Brami5, Vinita S Joardar2, Justin Johnson2, Diana Radune4, Irtisha Singh6, Jonathan H Badger7, Ujjwal Kumar8, Milton Saier8, Yufeng Wang9, Hong Cai9, Jianying Gu10, Michael W Mather10, Akhil B Vaidya10, David E Wilkes11, Vidyalakshmi Rajagopalan11, David J Asai12, Chad G Pearson13, Robert C Findly14, Harry W Dickerson14, Martin Wu15, Cindy Martens16, Yves Van de Peer16, David S Roos17, Donna M Cassidy-Hanley18 and Theodore G Clark18

Author Affiliations

1 Genomic Medicine, J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA

2 Informatics, J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA

3 Biology, University of Pennsylvania, 3451 Walnut St, Philadelphia, PA 19104, USA

4 Joint Technology Center, J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA

5 Informatics, J Craig Venter Institute, 10355 Science Center Drive, San Diego, CA 92121, USA

6 Microbial and Environmental Genomics, J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA

7 Microbial and Environmental Genomics, J Craig Venter Institute, 10355 Science Center Drive, San Diego, CA 92121, USA

8 Biological Sciences, University of California - San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA

9 Biology, University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX 78249, USA

10 Microbiology and Immunology, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA

11 Biological Sciences, Indiana University - South Bend, 1700 Mishawaka Avenue, South Bend, IN 46634, USA

12 Undergraduate Science Education Program, Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, MD 20815, USA

13 Cell and Developmental Biology, University of Colorado - Denver, 13001 E. 17th Place, Aurora, CO 80045, USA

14 Infectious Diseases, College of Veterinary Medicine, University of Georgia, 501 DW Brooks Dr, Athens, GA 30602, USA

15 Department of Biology, University of Virginia, 485 McCormick Road, Charlottesville, VA 22903, USA

16 Plant Systems Biology, Ghent University, Technologiepark 927, Ghent, B-9052, Belgium

17 Biology and Penn Genome Frontiers Institute, University of Pennsylvania, 3451 Walnut St., Philadelphia, PA 19104, USA

18 Microbiology and Immunology, College of Veterinary Medicine, Cornell University, C5 181 Veterinary Medical Center, Ithaca, NY 14853, USA

For all author emails, please log on.

Genome Biology 2011, 12:R100  doi:10.1186/gb-2011-12-10-r100

Published: 17 October 2011

Additional files

Additional file 1:

Table S1 - additional assembly statistics.

Format: XLSX Size: 44KB Download file

Open Data

Additional file 2:

Figure S1 - mean scaffold coverage depth. Mean coverage depth is plotted against scaffold length, showing that, for larger scaffolds, coverage does not diverge greatly from the mean.

Format: PDF Size: 121KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Table S2. (a) Optical map results. Column B lists the scaffold IDs for the 295 scaffolds mapped to the 69 complete and four partial optical chromosome maps (listed in column A from largest to smallest, with the four partial chromosomes at the end). No scaffolds aligned reliably to chromosomes 53, 55, 65 and 66. Column C indicates the orientation of the scaffold sequence relative to the optical map, either end to beginning (EB) or vice versa. 'Chromosome Start' and 'Chromosome End' are calculated from the optical map data and correspond to the positions where each scaffold reliably aligns. 'Scaffold Start' and 'Scaffold End' indicate the portion of the predicted SpeI digest of each scaffold that aligns to the map. All lengths are in base pairs. Among the telomere-containing scaffolds, it is evident that the chromosome and scaffold values are not always in exact agreement with their chromosome-terminal positions due to experimental uncertainty in the optical mapping protocol. In the 18 cases (highlighted in yellow) where SOMA but not MapSolver placed a telomere-containing scaffold, the 'Chromosome Start' and 'Chromosome End' values are simply calculated from the total chromosome length and the length of the scaffold. In total, 242 scaffolds were placed by agreement between MapSolver and SOMA with no input from telomere data; 231 of these were placed by SOMA using the highest confidence MATCH algorithm, 9 using the FILTER algorithm and 2 using the SCHEDULE algorithm (see Materials and methods). Thirty-four scaffolds were placed by agreement between MapSolver, SOMA (33 MATCH, 1 SCHEDULE) and telomere position. Eighteen were placed by agreement between SOMA (9 MATCH, 7 SCHEDULE, 2 FILTER) and telomere position. One was placed on partial chromosome 73 by agreement between MapSolver, SOMA and telomere position, although the optical map position is non-terminal, presumably due to a misassembly (see Results and discussion). (b) Unmapped telomeric scaffolds. IDs of the 65 telomere-containing scaffolds that did not reliably align to a unique position on the optical map.

Format: XLSX Size: 79KB Download file

Open Data

Additional file 4:

Table S3 - correspondence of predicted genes for ATP synthase subunits of T. thermophila and Ich.

Format: DOCX Size: 25KB Download file

Open Data

Additional file 5:

Table S4 - non-coding RNAs in the Ich genome.

Format: XLSX Size: 39KB Download file

Open Data

Additional file 6:

Figure S2 - codon usage. (a) Principal component analysis of relative synonymous codon usage in Ich. (b) Effective number of codons (ENc; a measure of overall codon bias) for each predicted ORF is plotted versus GC3 (the fraction of codons that are synonymous at the third codon position that have either a guanine or a cytosine at that position). The upper limit of expected bias based on GC3 alone is represented by the red curve.

Format: PDF Size: 4.8MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Table S5 - mapping of Ich predicted proteins to ortholog groups, phylogeny, kinome annotation and enzyme annotation.

Format: XLSX Size: 699KB Download file

Open Data

Additional file 8:

Table S6 - ortholog grouping of the predicted proteomes of ciliates. A listing of all unique ortholog groups mapped to Ich, T. thermophila and P. tetraurelia protein coding genes. The total number of genes mapped to each ortholog group for each species is indicated, allowing expansions to be identified. The phyletic profile of the mapped ortholog groups is given in the last column.

Format: XLSX Size: 249KB Download file

Open Data

Additional file 9:

Table S7 - comparison of kinase families in Ich and selected other species. Comparison of all identifiable kinase families from Ich with other species. The numbers indicate the total number of kinase genes from each species for individual families of kinases. Colors are used to highlight kinase families that are present in all three ciliates (yellow), missing in Ich but present in other two ciliates (light blue), and shared between ciliates and apicomplexa only (green). The atypical histidine kinase family, which is greatly expanded in ciliates, is highlighted in pink. The kinase families that are expanded and have at least ten genes in Ich are indicated with red fonts.

Format: XLSX Size: 57KB Download file

Open Data

Additional file 10:

Figure S3 - multiple sequence alignment of Ich immobilization antigen peptide sequences. Alignment was generated using MUSCLE [126] and edited by hand. Conserved cysteine residues are enclosed in red rectangles. Hydrophobic regions at the amino and hydroxyl termini are shown with yellow highlighting.

Format: PDF Size: 183KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Table S8 - membrane transporter analysis. Proteins are tabulated according to TC number within the Transporter Classification Database (TCDB) [49,50]. Columns G and H present the query and hit topologies expressed in number of TMSs.

Format: XLS Size: 159KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 12:

Table S9 - membrane transporter family distribution.

Format: XLS Size: 35KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

Figure S4 - membrane transporter topological distribution. The number of proteins exhibiting a specific topological type - that is, of a putative number of TMSs - is plotted versus the number of predicted proteins of that topology, showing that proteins with one, two or three putative TMSs are substantially less numerous than those with four or six putative TMSs. Proteins with 9 or 10 predicted TMSs are present in much lower numbers, but there are increased numbers with 11 and 12 TMSs. Larger proteins are present in relatively small numbers. In general, transport proteins often have 6 or 12 TMSs, although programs that predict topology are often in error by 1 or 2 TMSs [127].

Format: DOCX Size: 54KB Download file

Open Data

Additional file 14:

Table S10 - complete listing of all predicted Ich protease-encoding genes.

Format: XLSX Size: 17KB Download file

Open Data

Additional file 15:

Table S11 - comparative listing of protease-encoding gene classes in ciliates.

Format: XLSX Size: 15KB Download file

Open Data

Additional file 16:

Figure S5 - comparison of Ich metabolic enzymes painted on KEGG pathways with those of T. thermophila, P. tetraurelia and D. rerio. For each pathway, hyperlinks are provided to view the relevant KEGG map painted in red foreground to indicate enzymes present in Ich and green background to indicate enzymes present in other organisms.

Format: ZIP Size: 1.7MB Download file or display content in a new window

Open Data