Reasearch Awards nomination

Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Research

The genome and transcriptome of the enteric parasite Entamoeba invadens, a model for encystation

Gretchen M Ehrenkaufer1, Gareth D Weedall2, Daryl Williams2, Hernan A Lorenzi3, Elisabet Caler3, Neil Hall24* and Upinder Singh15*

Author Affiliations

1 Division of Infectious Diseases, Department of Internal Medicine, Stanford University, Stanford, California, 94305, USA

2 Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool, UK

3 J Craig Venter Institute, Rockville, Maryland, L697ZB USA

4 Faculty of Science, King Abdulaziz University, Jeddah, 21589, SA

5 Department of Microbiology and Immunology, Stanford University, Stanford, California, 94305, USA

For all author emails, please log on.

Genome Biology 2013, 14:R77  doi:10.1186/gb-2013-14-7-r77

Published: 26 July 2013

Additional files

Additional File 1:

Flowchart illustrating the JCVI Eukaryotic Annotation Pipeline (JEAP). The flowchart illustrates the steps and software used in eukaryotic genome annotation and gene family assignment that were applied to the E. invadens genome assembly.

Format: JPEG Size: 154KB Download file

Open Data

Additional File 2:

Putative multi-gene families (of two or more genes) in the genome of E. invadens. Membership of a multi-gene family was defined by sharing the same set of functional protein domains with other genes, rather than by sequence similarity. All genes in the genome are listed, along with membership (or not) of a multi-gene family. In addition, the BLASTP best hit and reciprocal best hit in E. histolytica are shown.

Format: XLSX Size: 642KB Download file

Open Data

Additional File 3:

Mapping statistics for all sequence libraries. Tables recording the total number of reads in each replicate transcriptome library and the total number and percentage of reads aligned to the reference genome sequence. For differential gene expression analysis, Bowtie alignments were of 35 bp reads, allowing up to three mismatches and only retaining uniquely mapped reads (reads that did not align equally well to more than one genome region). For genome annotation-based analyses, Tophat alignments of combined libraries at each time point were of 50 bp reads, using default parameters. The number of introns identified by each alignment was also recorded.

Format: PDF Size: 75KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 4:

Read counts, normalized gene expression levels, temporal expression profiles and significant differential expression versus baseline expression for all 11,549 loci. Table showing expression data for 11,549 annotated E. invadens genes. For each gene, the gene ID, product description and genomic location are shown, along with read counts in each gene for each sample, normalized gene expression levels (fragments per kilobase per million mapped reads (FPKM)), lower and upper bounds of the 95% confidence interval ('FPKM_conf_low' and 'FPKM_conf_high') for each time point, temporal gene expression profiles (profile IDs relate to profiles shown in Additional file 1) and significantly differentially expressed genes compared to expression in trophozoites or 72 h cysts (for encystation and excystation, respectively).

Format: XLSX Size: 5.5MB Download file

Open Data

Additional File 5:

Correlation of read count values per gene among replicates taken at the same time point. Scatter plots of non-normalized read counts per gene for pairs of replicate libraries per time point. Axes are log-scaled for display purposes.

Format: PDF Size: 411KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 6:

Results of BLAST search of the Pfam database using translated unannotated transcripts. Table showing the results of searching translated open reading frames of putative transcripts that do not overlap annotated genes against the Pfam database to identify unannotated protein coding genes/pseudogenes.

Format: XLSX Size: 206KB Download file

Open Data

Additional File 7:

Validation of annotated introns by transcriptome mapping. Table recording the status of all 5,894 annotated introns. Predicted introns validated by transcriptome mapping, as well as those where only the 5' or 3' end were validated, are shown. In addition to this, reads mapped entirely within an intron are counted to infer incorrect introns (or incompletely spliced introns).

Format: XLSX Size: 1.1MB Download file

Open Data

Additional File 8:

Temporal gene expression profiles during encystation and excystation. Expression profiles during encystation and excystation, estimated by the short time course expression miner (STEM) software. Black lines show representative profiles and red lines indicate individual genes assigned to each profile. Each profile is numbered at the top right (these profile numbers are used in Additional file 1) and a P-value indicating the significance of gene enrichment (more genes assigned to profile than expected by chance) is shown at the bottom left. Clusters of similar profiles are indicated by colored shading.

Format: PDF Size: 219KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 9:

Results of differential gene expressed analysis for all possible pairwise comparisons. Genes significantly differentially regulated (FDR <0.01) by Cuffdiff for each of the 42 pairwise comparisons among time points. For each gene, gene ID, locus position, sample 1 name, sample 2 name, sample 1 FPKM, sample 2 FPKM, log fold change (log2(FPKM_2/FPKM_1)), test statistic, uncorrected P-value and corrected P-value (for FDR <0.01) are shown.

Format: XLSX Size: 2.8MB Download file

Open Data

Additional File 10:

Complete results of Pfam and GO term analysis. Worksheet 1 contains all Pfam domains that were significantly (P < 0.05) enriched in genes up or down regulated at 8 h, 24 h and 72 h post-encystation, at 2 h post-excystation. Pfam accession number, Pfam symbol, a brief description of the domain, total numbers for each Pfam domain in the E. invadens genome, numbers of each domain in the regulated genes, and the P-value for enrichment are shown. Worksheet 2 contains all GO terms that were significantly (P < 0.05) enriched in genes up or down regulated at 8 h, 24 h and 72 h post-encystation, and at 2 h post-excystation. GO accession number, a brief description, total numbers of genes in each category in the E. invadens genome, number of genes in each category among the regulated genes, and the P-value for enrichment are shown.

Format: XLSX Size: 36KB Download file

Open Data

Additional File 11:

FPKMs of genes related to meiosis. Expression of all meiosis-related genes during encystation and excystation. Gene ID, description and FPKM values for each time point are shown for both meiosis-specific and meiosis-associated genes.

Format: XLSX Size: 36KB Download file

Open Data

Additional File 12:

Sequences of all primers used in this study. Sequences for all primers used in this study. (A) Primers used to generate PCR probes used in Northern blotting. For each primer, ID of the targeted gene, primer orientation and sequence are shown. (B) Primers used in cDNA first strand synthesis. Primer name and sequence are shown.

Format: XLSX Size: 43KB Download file

Open Data