| Signal sequence analysis of expressed sequence tags from the nematode Nippostrongylus brasiliensis and the evolution of secreted proteins in parasites1Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, EH9 3JT, UK 2Department of Biological Sciences, Imperial College London, London SW7 2AZ, UK 3Current address: Program in Genetics and Genomic Biology, Hospital for Sick Children, University Avenue, Toronto, Ontario M5G 1X8, Canada 4Current address: Facultad de Química, Cátedra de Inmunología, Universita de la Republica, Montevideo 11300, Uruguay
Genome Biology 2004, 5:R39doi:10.1186/gb-2004-5-6-r39 Subject areas: Microbiology and parasitology, Evolution, Genome studies, Cell biology The electronic version of this article is the complete one and can be found online at: http://genomebiology.com/2004/5/6/R39
© 2004 Harcus et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. AbstractBackgroundParasitism is a highly successful mode of life and one that requires suites of gene adaptations to permit survival within a potentially hostile host. Among such adaptations is the secretion of proteins capable of modifying or manipulating the host environment. Nippostrongylus brasiliensis is a well-studied model nematode parasite of rodents, which secretes products known to modulate host immunity. ResultsTaking a genomic approach to characterize potential secreted products, we analyzed expressed sequence tag (EST) sequences for putative amino-terminal secretory signals. We sequenced ESTs from a cDNA library constructed by oligo-capping to select full-length cDNAs, as well as from conventional cDNA libraries. SignalP analysis was applied to predicted open reading frames, to identify potential signal peptides and anchors. Among 1,234 ESTs, 197 (~16%) contain predicted 5' signal sequences, with 176 classified as conventional signal peptides and 21 as signal anchors. ESTs cluster into 742 distinct genes, of which 135 (18%) bear predicted signal-sequence coding regions. Comparisons of clusters with homologs from Caenorhabditis elegans and more distantly related organisms reveal that the majority (65% at P < e-10) of signal peptide-bearing sequences from N. brasiliensis show no similarity to previously reported genes, and less than 10% align to conserved genes recorded outside the phylum Nematoda. Of all novel sequences identified, 32% contained predicted signal peptides, whereas this was the case for only 3.4% of conserved genes with sequence homologies beyond the Nematoda. ConclusionsThese results indicate that secreted proteins may be undergoing accelerated evolution, either because of relaxed functional constraints, or in response to stronger selective pressure from host immunity. BackgroundA central tenet of parasitology is that parasites must secrete biologically active mediators that modify or customize their niche within the host in order to survive immune attack. Such secretions have long been the focus of biochemical and immunological analyses [1-4]. With larger-scale genomic approaches now possible, a screen can be designed in which the characteristic signal sequences, necessary for proteins to exit the eukaryotic cell via the secretory pathway, can be identified by bioinformatic methods [5-9]. We describe here an analysis of this nature, applied to a widely used model system, Nippostrongylus brasiliensis, the gastrointestinal nematode of rats [10-12]. N. brasiliensis biology encapsulates many key aspects of parasite infection and immunology. It is a multicellular metazoan belonging to the phylum Nematoda, which together with the platyhelminth groups (Cestoda and Trematoda) are collectively known as helminths. Helminth infections are typically accompanied by a polarized type-2 (Th2) immune response, characterized by IgE antibody production, eosinophilia and mastocytosis [13-15]. N. brasilensis drives extremely strong Th2 responses [16], and this bias can be reproduced with secreted proteins collected from parasites in vitro [17]. More than 100 secreted proteins have been found by two-dimensional SDS-PAGE analysis (Y.H. and R.M.M., unpublished work), and among those experimentally verified are acetylcholinesterases [18-20], cysteine proteases [21,22], and a hydrolase that degrades an important host inflammatory mediator, platelet activating factor [23,24]. The molecular biological analysis of N. brasiliensis genes and gene products is at a very early stage. Secreted and intracellular globins have been characterized [25], and genes for both secretory [26,27] and neuronal [28] acetylcholinesterases cloned. A recombinant cystatin (cysteine protease inhibitor) has been shown functionally to inhibit host antigen-processing pathways [29]. Structural genes for both tubulin [30] and a keratin-like protein [31] have been described, and an α-crystallin-like small heat-shock protein (Hsp20) has been reported [32]. However, these studies on individual genes have yet to be complemented by higher-throughput molecular analyses. The potential of N. brasiliensis as an experimental system for functional genomics has been greatly enhanced by the demonstration of successful RNAi knockdown in this species [33]. The genomes of parasitic nematode species are between 60 and 250 megabases (Mb) in size [34], and there are more than 20 species of medical, veterinary and scientific importance [35]. Over the past decade, the most tractable way of applying genomics to this group of organisms has been by expressed sequence tag (EST) projects [36]. Large-scale EST sequencing of the human filarial parasite Brugia malayi [37,38] has been followed by similar studies in the sheep intestinal worm Haemonchus contortus [39], human hookworms [40], the river-blindness parasite Onchocerca volvulus [41], and important plant-parasitic species such as Meloidogyne incognita [42]. Smaller projects have added Litomosoides sigmodontis [43], Toxocara canis [44] and many other related species to the available database of parasitic nematode sequences [36]. In designing a study on N. brasiliensis, we wished to focus on the potential for secreted proteins that may interact with the host immune system. We therefore conducted an EST project that included a cDNA library specifically enriched for full-length inserts [45], allowing analysis of amino-terminal signal peptides to be carried out. The evolutionary history of secreted immunomodulators is likely to be that of recent adaptation from ancestral genes which fulfilled other functions in free-living ancestors. Comparative studies on nematodes can take advantage of full-genome information available for the free-living species Caenorhabditis elegans [46] and C. briggsae [47], which are quite closely related to N. brasiliensis [48]. If rapid evolution of secreted gene products was required for efficient parasitism, this may be evident in greater diversity among signal peptide-bearing sequences than among genes coding for non-secreted proteins. We report here our results that support this hypothesis. Results and discussionA high proportion of N. brasiliensis ESTs encode proteins with predicted signal sequencesA total of 1,234 ESTs were collected from adult N. brasiliensis cDNA libraries constructed either by conventional means or by an oligo-capping method to select full-length cDNAs [45]. A full analysis of these has been posted on our website [49]. ESTs were then analyzed by SignalP, which predicted that 16.0% of total ESTs (197/1,234) contained either 5' signal peptide sequences (176/1,234) or signal anchors (21/1,234, Table 1). The oligo-capped cDNA library yielded a notably higher proportion of sequences with predicted signal peptides (20.4%) than did conventional cDNA libraries (10.1%). Table 1. Analysis of transcripts represented in conventional and oligo-capped cDNA libraries The dataset was then clustered to account for multiple ESTs from highly expressed genes, and ESTs were assigned to 742 clusters, including 567 singletons. The proportion of clusters bearing potential signal sequences remained high (135/742; 18.2%), confirming that the dataset is not skewed by over-representation of a few abundant transcripts. The overall proportion of cDNAs encoding predicted signal peptides is within the 15-25% range estimated by analyis of whole-genome sequence data [50]. Of all predicted signal-sequence-bearing clones or clusters from N. brasiliensis, around 90% were classified as conventional signal peptides associated with export and secretion into the extracellular environment. The remaining approximately 10% were identified as potential signal anchors, in which the hydrophobic amino-terminal segment is retained, without cleavage, as a transmembrane domain for type II plasma membrane proteins [7]. Presence of trans-spliced leaders in N. brasiliensisAll nematodes undergo trans-splicing at the 5' end of a proportion of their mRNA transcripts; a short leader sequence is added upstream of the initiation codon. The leader is normally a 22-nucleotide sequence termed SL1 [51]. The precise SL1 sequence is highly conserved throughout the phylum, although the degree to which transcripts are trans-spliced varies between different nematode species [52]. To evaluate the prominence of SL1-trans-splicing in N. brasiliensis, we searched the 1,234 ESTs with the 3' 14 nucleotides of SL1, to allow for any minor truncation of cDNAs. Only 37 matches were found, all from the oligo-capped cDNA library (from 500 ESTs, giving a frequency of 7.4%); a few clones from the conventional libraries had 10 or fewer nucleotides identical to the SL1 sequence at their 5' termini. Although the overall frequency of trans-splicing in N.brasiliensis is not yet known, this level is well below those of other species, such as C. elegans. Moreover, transcripts bearing the spliced leader (and its unique tri-methylguanosine cap) are, in certain species, under-represented by the method we used to selectively amplify full-length mRNAs [45]. Hence the true extent of trans-splicing may be higher than the proportion evident in the current dataset. N. brasiliensis sequences show closest similarity to those of other trichostrongylesN. brasiliensis is a stronglylid nematode, closely related to veterinary parasites such as Haemonchus contortus and Teladorsagia (previously Ostertagia) circumcincta in the Superfamily Trichostrongyloidea, and within the Order Strongylida which includes human hookworm pathogens Ancylostoma duodenale and Necator americanus [53]. The closest free-living taxa to the Strongylida are members of the Rhabditina, including C. elegans, and both are grouped in Clade V of the Nematoda, on the basis of small subunit rRNA sequence analysis [48]. A more objective technique for visualizing the evolutionary relationships between species for which large datasets are available is to use SimiTri, which plots in two-dimensional space the relative similarities of gene sequences between one species (N. brasiliensis) and three comparators [54]. As shown in Figure 1a, N. brasiliensis sequences group slightly closer to Haemonchus than to Ancylostoma, consistent with the relationship described above. Likewise, in Figure 1b, N. brasiliensis sequences group more towards Teladorsargia than Necator.
A compilation of the N. brasiliensis clusters, for which assigned homologs exist in protein databases, is presented in Table 2. Many sequences with high similarities to biosynthetic, structural, signaling and regulatory pathway proteins can readily be identified, corresponding to predicted nuclear or cytoplasmic proteins. Interestingly, multiple clusters encode categories of genes which are prominent in other nematode parasites, such as the five clusters encoding homologs of Ancylostoma secreted protein [2], five clusters of C-type and S-type lectins [55] and seven clusters for cysteine proteinases [56]. Table 2. ESTs from adult cDNAs with known homologs, classified by function Proteins bearing signal sequences are less evolutionarily conservedThe set of 742 clusters was then divided into three categories according to their similarity to existing database sequences. 'Conserved' genes were defined as those with similarities to any non-nematode database entry above a given cutoff score; 'nematode-specific' genes were similar only to sequences from C. elegans or other nematode species, and 'novel' showed no similarity to any existing entry. BLASTX cutoff scores of 50 (P < e-6) and 80 (P < e-10) were both used to define these categories at different levels. Using the more stringent criterion, roughly one third (27-37%) of clusters fell into each category (Figure 2a), while the lower cutoff resulted in approximately half (48%) being classified as conserved, with the remainder evenly divided between nematode-specific (25%) and novel (27%).
The distribution of clusters containing signal sequences was, however, remarkably skewed towards the novel category. Because the primary classification of 92 novel genes was based on 5' EST sequences, all clusters initially designated as novel signal-sequence positive were further scrutinized. In 72 cases, clusters read through to a 3' poly(A) tail (either single reads from clones of 700 or fewer nucleotides or overlapping ESTs with at least one poly(A) tail present); in 20 cases, where no poly(A) tail was observed, 3' sequencing was carried out. Of these, three showed database homologies from 3' sequence and were reclassified as conserved, and two showed no poly(A) tail and were excluded from further analysis as presumed internal fragments. The remaining 15 clusters showed overlap between 3' and 5' cluster reads, without revealing any additional similarities. Thus, a total of 87 clusters were verified as novel signal-sequence positive. Taking this more rigorously defined subset, some 65% (87/133) of sequences are predicted to encode either signal peptides or signal anchors when classified as novel at the higher cutoff (49% at the lower level), and only 4% were found in the conserved category (7% at the lower cutoff). Moreover, 32% of all novel sequences contained a signal peptide or anchor, compared to 18% of nematode-specific and only 3.4% of conserved. Although the latter category will include many structural and housekeeping proteins for which secretion is unlikely to confer a selective advantage, the data suggest that nematode secreted proteins have diversified more rapidly than those that do not enter the secretory pathway. This association between signal peptides and novel proteins may be falsely amplified where, for example, conserved domains are sufficiently distant from the amino terminus to have been omitted from EST sequences. Equally, some clones will have been sequenced from truncated transcripts, and a proportion of those erroneously classified as encoding non-signal sequence bearing proteins. However, neither of these considerations seems likely to account for the very large disparity in signal sequence frequency between the three categories we describe. A more general caveat with these analyses is that SignalP is a fallible prediction tool, with an accuracy of 70% or less when applied to non-mammalian sequences [6]. There is no reason, however, to expect that false-positive assignations would occur disproportionately in the novel group rather than the conserved, and the conclusion drawn here would remain valid over a wide range of prediction accuracies. Has there been evolutionary acquisition of signal peptides?The subset of signal-peptide-encoding N. brasiliensis clusters with similarity to predicted genes from C. elegans with either assigned function or of no known function was then identified. Examples of each category are given in Table 3. Some nine clusters were identified as bearing signal-peptide sequences, where in each case the C. elegans homologs appear not to possess a signal-pepide motif. Five of these clusters represent globins, which have previously been noted to possess signal peptides in N. brasiliensis even though the C. elegans paralogs do not [25,57]. One cluster (NBC00028) is almost identical to the recorded cuticular isoform precursor (P51536), but four additional clusters represent new members of this family in N. brasiliensis bearing signal peptides. In contrast, a distinct globin (NBC00095) closely related to the known body-wall isoform (P51535) lacks a predicted signal peptide. Hence, gene duplication may have predated the development in some globin forms, of a secretory function. Table 3. ESTs from adult cDNAs with predicted amino-terminal signal peptides and with homologs in C. elegans In these cases, and in the four additional examples given in Table 3, it is possible that pre-existing genes have been adapted for secretion or membrane expression in order to promote parasitism. Acquisition of secretory signals may not, in evolutionary terms, be demanding, in view of the report that approximately 20% of protein-coding fragments from Saccharomyces cerevisiae can function as a signal peptide [58]. In the case of the globins, conversion to the secretory pathway (as well as gene multiplication) may be interpreted as a physiological adaptation to the environment within the mammalian gastrointestinal tract [57]. Whether any of the four remaining genes in this category might have undergone a similar evolutionary process to counter immune attack is unknown at this stage. Similar findings have previously been reported in individual genes from other nematode parasites. In B. malayi, the microfilarial secreted serpin gene (Bm-spn-2) is homologous to eight C. elegans genes, none of which encodes a signal peptide [59]. Likewise, the extracellular glutathione-S-transferase gene, Ov-gst-1, of Onchocerca volvulus has acquired a signal-peptide sequence [60], as has a gene for keratin-like protein (KLP) in N. brasiliensis itself [31]. Hence, conversion of key gene products to secretory function may be a common adaptive strategy for parasitic organisms. ConclusionsOur study raises both methodological and evolutionary questions. First, it remains to be determined how valid is the assumption that signal sequences reflect secretion into the parasite environment. Clearly, this notion must be qualified in a metazoan parasite, because many such proteins will remain on the cell surface or be sorted to extracellular and extracytosolic compartments within the worm. However, the extent to which signal-peptide-bearing proteins are truly exported by these multicellular organisms will be clarified by current proteomic analyses on proteins secreted by the same adult-stage parasites as were used to construct the cDNA libraries. The same studies will answer a further methodological caveat: proteins can be secreted by non-signal-sequence-dependent pathways, and we have no information on the extent to which parasites may avail themselves of this possibility. One example already exists, of the macrophage migration inhibitory factor homolog of B. malayi which is exported despite lacking a signal peptide [61,62]. On a broader platform, we have addressed the question of whether secreted proteins of parasitic nematodes show accelerated evolution, and our results indicate that this is the case. The predominance of predicted secreted proteins in the novel class prevents us, at this stage, from discerning whether rapid evolution was consequent upon acquiring secretory status, or if the more divergent gene products were those most advantageous to co-opt into secretion. Parallel studies on other parasitic nematodes would now clarify these and additional issues. Have genes for parasite secreted proteins indeed acquired signal peptides, or have free-living lineages lost these motifs in the genes in question? Is more rapid diversification of secreted proteins a specific feature of parasitic nematodes, or can a similar phenomenon be observed in comparisons between divergent free-living organisms (such as C. elegans and C. briggsae)? These questions are now under study. Materials and methodsParasite materialN. brasiliensis was maintained in Sprague-Dawley rats as previously described [10,63]. For cDNA synthesis, adult worms were recovered from gastrointestinal contents 5 or 6 days following subcutaneous injection of 3,000 infective L3 larvae. Adults were recovered by Baermannization in saline at 37°C, washed 6 × in saline and 6 × in RPMI1640 containing 100 μg/ml penicillin and 100 U/ml streptomycin. Worms were incubated with 10% gentamicin for 20 min and then washed a further 6 × in RPMI1640 with antibiotics before immersion in Trizol for mRNA preparation. cDNA librariesConventional libraries were constructed in Uni-Zap (Stratagene) and propagated in pBluescript SK+ from mixed adult worm mRNA as previously described [27]. To construct an oligo-capped cDNA library, the technique of Fernández [45] was followed. mRNA was isolated from 1 ml of packed adult N. brasiliensis (approximately 10,000 worms) homogenized in 10 ml Trizol (Gibco Life Technologies). The homogenate was centrifuged (12,000g, 10 min), and the supernatant extracted with chloroform before isopropanol precipitation of RNA from the aqueous phase. mRNA was then purified with PolyA Purist oligo-dT cellulose (Ambion). Following dephosphorylation with calf intestinal phosphatase, mRNA was treated with tobacco acid pyrophosphatase to remove the 7-methylguanosine terminal cap on full-length mRNAs, leaving these with a reactive phosphate group. These were then adducted with the GeneRacer oligonucleotide (Invitrogen). Reverse transcription of mRNA was primed with a tagged oligo-dT (NotI primer-adapter). In this way, full-length transcripts contained specific extension sequences (5' Gene Racer and 3' oligo-dT tag) amenable to PCR amplification. Following PCR, products were ligated at both ends to SalI adapters, so that subsequent digestion with NotI provided inserts with cohesive ends to be directionally cloned into NotI/SalI-digested pSPORT1 vector. EST sequencingThe library was used to transform DH10B Escherichia coli by electroporation, plated on ampicillin agar petri dishes, and colonies picked for sequencing. All colonies picked were grown overnight in 96-well plates, which were used to provide template samples for PCR before being directly archived. PCR reactions used M13 forward and reverse primers, and following shrimp alkaline phosphatase/exonuclease I treatment, products were directly sequenced with T7 primer on ABI automated sequencers. Archived clones are available on request from R.M.M. Where 3' sequencing was required, T3 primer was used. BioinformaticsRaw sequence trace data were processed to screen out vector and linking sequence, to remove low-quality sequence, and to trim poly(dA) tails using an in-house software solution. The resulting sequences were annotated with similarity information and library details and submitted to dbEST. To identify the nonredundant set of putative gene objects, sequences were clustered on the basis of sequence similarity using the CLOBB program [64]. Consensus sequences representing the putative gene objects were then generated from clusters containing more than one sequence using the assembly program phrap (Phil Green, University of Washington; available from [65]). Clusters containing only a single sequence ('singletons') and the consensuses generated from clusters containing more than one sequence ('clusters') were then subjected to the following BLAST analyses: BLASTN against a nonredundant DNA database (GenBank); BLASTX against a nonredundant protein database (SwissProt-trEMBL) and BLASTN against dbEST. Results from these analyses are available from our online database - NEMBASE [49]. Peptide predictions were performed on individual sequences using the program DEcoder [66]. Where DEcoder was unable to predict a peptide, ESTscan [67] was used. SignalP V2.0 [6] was used to predict the presence of secretory signal peptides and signal anchors for each of the predicted proteins. Peptides were defined as bearing a signal peptide if both the hidden Markov model (HMM) predicted the presence of a secretory leader and three of the four parameters defined by the neural network model (C-score, Y-score, S-score and S-mean, as described in legend to Table 3) were fulfilled. Signal anchors were predicted if both the HMM predicted a signal anchor and two of the four criteria specified by the neural network model were fulfilled. Selected clones were subject to comparative analysis with database entries from C. elegans and other species. Alignments were made using Clustal X within MacVector 7.0 (Oxford Molecular) and the SignalP V2.0 web server [68] was used to chart hydrophobicity and potential cleavage sites in predicted protein sequences. Cross-taxon similarity analysisThe relative similarity between N. brasiliensis EST sequences and those from the related parasitic nematodes Ancylostoma caninum/duodenale, Haemonchus contortus and Teladorsagia circumcincta were plotted with the SimiTri program [54], downloadable from [69]. AcknowledgementsWe thank Michelle Lizotte-Waniewski for constructing one of the original cDNA libraries in Edinburgh. The work was supported by through the Wellcome Trust, in programme grants to R.M.M. and M.E.S., a project grant to M.L.B. and an International Travelling Fellowship to C.F. References
Have something to say? Post a comment on this article! |


on Google Scholar






author email
corresponding author email
Figure 1.
Figure 2.