Open Access Highly Accessed Research

Transcriptome analyses of primitively eusocial wasps reveal novel insights into the evolution of sociality and the origin of alternative phenotypes

Pedro G Ferreira16, Solenn Patalano23, Ritika Chauhan4, Richard Ffrench-Constant4, Toni Gabaldón1, Roderic Guigó1 and Seirian Sumner25*

Author Affiliations

1 Center for Genomic Regulation, Universitat Pompeu Fabra (CRG-UPF), Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain

2 Institute of Zoology, Zoological Society of London, Regent's Park, NW1 4RY, UK

3 The Babraham Institute, Babraham Research Campus, Cambridge, CB22 3AT, UK

4 Centre for Ecology and Conservation, Biosciences, University of Exeter, Tremough, Penryn, TR10 9EZ, UK

5 Current address: School of Biological Sciences, University of Bristol, Woodland Road, Bristol, BS8 1UG, UK

6 Current address: Department of Genetic Medicine and Development, University of Geneva Medical School, 25 Rue Michel Servet 1, 1211 Geneva, Switzerland

For all author emails, please log on.

Genome Biology 2013, 14:R20  doi:10.1186/gb-2013-14-2-r20

Published: 26 February 2013

Additional files

Additional file 1:

Sections 1 to 10 and Tables S1 and S2. Additional methods and results referred to in the main text can be found here. Table S1: comparison of transcriptome assemblies. Number of transcripts, genes and respective transcript length statistics for the optimal assemblies generated with Version 2.3 and Version 2.5 of GS Newbler Assembler, and Version 3.0.5 of Mira Assembler and Oases (with Illumina paired-end sequence data). Table S2: numbers of differentially expressed (q > 0.6) genes and transcripts was robust to different numbers of biological replicates.

Format: DOCX Size: 86KB Download file

Open Data

Additional file 2:

Read coverage from the 454 transcriptome assembly. (A) Distribution of the number of 454 reads per transcript (truncated to transcripts with less than 500 reads). (B) Distribution weighted by the transcript length (truncated plot). The majority of transcripts had more than 5.19 reads, on average.

Format: TIFF Size: 2.2MB Download file

Open Data

Additional file 3:

Functional groups identified in the P. canadensis transcriptome. Distribution of the number of genes according to different GO terms from the 454 pooled transcriptome assembly. Categories were filtered by a minimum number of sequences: 50 for cellular component; 200 for biological process; and 100 for molecular function.

Format: TIFF Size: 7.7MB Download file

Open Data

Additional file 4:

Information on the best BLAST hits in the NR database for each transcript; noncoding RNA potential for genes with and without hits in NR. Tabs: 'besthits_with_NR', information extracted from the best hits for each transcript; 'PortraitScore_transcriptsWithNOHits', 'PortraitScore_transcriptsWithHits', score from the portrait program.

Format: XLS Size: 8.4MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Information on functional analyses and comparison with existing caste expression data from other species. Tabs are as follows. 'Toth2010_Pmetricus_Pcan', mappings between P. metricus and P. canadensis transcripts and their respective expression levels and whether they are significantly differentially expressed in P. canadensis. 'Whitfield2003_50cDNAS', mappings between 50 most discriminative cDNAs for worker behaviors in A. mellifera from Whitfield et al. [18] and P. canadensis, with respective expression values. 'GO-Enrichment Differential Expressed (DE) Genes', list of over/under-represented GO terms for genes significantly up- and down-regulated in each caste comparison. Genes were categorized according to biological processes (P), molecular function (F) and cellular component (C). Most specific terms were retrieved with Blast2GO and correspond to the leaf nodes in the GO tree, this excludes cases where parent and child nodes may be included. Column labels are: Test (number of genes in the test set annotated with this GO term); Ref (number of genes in the ref set annotated with this GO term); notAnnotTest (number of genes in the test set NOT annotated with this GO term); notAnnotRef (number of genes in the reference set NOT annotated with this GO term), where 'test' set is the set of genes up-regulated in queens, and the 'ref' set those up-regulated genes in workers, and vice-versa. 'GO-Enrichment', list of over/under-represented GO terms for the P. canandensis genes that are conserved across the genomes of 6 eusocial insects. 'genes_with_Behavior_GOterm', genes classified with GO term thought to be relevant to behavior, their respective descriptions and direction of differential expression in P. canadensis. 'genes_accelerated_Pol_Bee_Ants', genes found accelerated in Polistes, bee and ants, and direction of differential expression in P. canadensis. 'genes_accelerated_polistes', genes found accelerated only in Polistes; and direction of differential expression in P. canadensis. 'phylogenetic marker identifiers', a list of the 93 marker genes used in the phylogenetic analyses. These correspond to phylomeDB codes in the Aphid phylome used as the reference markers. Sequences for these identifiers can be obtained from Huerta-Cepas et al. [57].

Format: XLS Size: 353KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Number of mapped reads per individual.

Format: TIFF Size: 3MB Download file

Open Data

Additional file 7:

Basic information about the assembled transcripts. Tabs: 'transcript_length', length of each transcript; 'transcript_to_gene', correspondence between transcript and gene.

Format: XLS Size: 5.2MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Cumulative distribution of gene expression for the four phenotypes.

Format: TIFF Size: 4.2MB Download file

Open Data

Additional file 9:

Expression values in RPKM (reads per kilobase per million) for all the individuals, indicating which caste each isotig is up-regulated in.

Format: CSV Size: 9.7MB Download file

Open Data

Additional file 10:

NOISeq probability values for each transcript for the comparison of each caste (that is, W (worker), Fo (foundress), Q (queen), or C (callow)) versus the others, and between queens and workers. Expression values for each caste and comparison, differential expression statistics ('M' and 'D' [12]), probability of differential expression ('prob'); 'ranking', which is a summary statistic of 'M' and 'D'.

Format: CSV Size: 15.6MB Download file

Open Data