|
| Expressed sequence tag analysis in Cycas, the most primitive living seed plant1The New York Botanical Garden, 200th Street and Kazimiroff, Bronx, NY 10458-5126, USA 2Genome Research Center, Cold Spring Harbor Laboratory, 500 Sunnyside Blvd, Woodbury, NY 11797, USA 3Institut für Bioinformatik (IBI), GSF National Research Center for Environment and Health, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany 4New York University, Department of Biology 1009 Main Building, New York, NY 10003, USA 5Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012-1185 6Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA 7Biology Department, Duke University, Box 91000, Durham, NC 27708, USA
Genome Biology 2003, 4:R78doi:10.1186/gb-2003-4-12-r78 Subject areas: Plant biology, Genome studies, Evolution The electronic version of this article is the complete one and can be found online at: http://genomebiology.com/2003/4/12/R78
© 2003 Brenner et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. AbstractBackgroundCycads are ancient seed plants (living fossils) with origins in the Paleozoic. Cycads are sometimes considered a 'missing link' as they exhibit characteristics intermediate between vascular non-seed plants and the more derived seed plants. Cycads have also been implicated as the source of 'Guam's dementia', possibly due to the production of S(+)-beta-methyl-alpha, beta-diaminopropionic acid (BMAA), which is an agonist of animal glutamate receptors. ResultsA total of 4,200 expressed sequence tags (ESTs) were created from Cycas rumphii and clustered into 2,458 contigs, of which 1,764 had low-stringency BLAST similarity to other plant genes. Among those cycad contigs with similarity to plant genes, 1,718 cycad 'hits' are to angiosperms, 1,310 match genes in gymnosperms and 734 match lower (non-seed) plants. Forty-six contigs were found that matched only genes in lower plants and gymnosperms. Upon obtaining the complete sequence from the clones of 37/46 contigs, 14 still matched only gymnosperms. Among those cycad contigs common to higher plants, ESTs were discovered that correspond to those involved in development and signaling in present-day flowering plants. We purified a cycad EST for a glutamate receptor (GLR)-like gene, as well as ESTs potentially involved in the synthesis of the GLR agonist BMAA. ConclusionsAnalysis of cycad ESTs has uncovered conserved and potentially novel genes. Furthermore, the presence of a glutamate receptor agonist, as well as a glutamate receptor-like gene in cycads, supports the hypothesis that such neuroactive plant products are not merely herbivore deterrents but may also serve a role in plant signaling. BackgroundThe Cycadales (cycads) are the most primitive living seed plants and have endured over 270-280 million years since their origins in the Lower Permian [1,2]. Cycads have a fern or palm-like appearance, largely due to their pinnately compound leaves (Figure 1a,b). Unlike ferns or palms, however, cycads belong to the gymnosperms, or non-flowering seed plants. Of the four orders that comprise the gymnosperms, the Cycadales are considered to be the most ancestral compared to Ginkgoales, Gnetales and Coniferales (Figure 2) [3,4]. Cycads (non-flowering seed plants) exhibit a number of characteristics that reflect their evolutionary position between ferns (non-seed plants) and angiosperms (flowering seed plants). Such characteristics include pollen tubes, which release motile sperm before fertilization; dichotomous branching (versus axillary branching in higher plants); and ovules, which contain a large, free-nuclear megagametophytic stage, that are borne on the margins of leaf-like megasporophylls [5-7]. These characteristics, among others, place cycads at a key node in plant evolution.
In addition to their evolutionary importance, cycads have also been studied in the field of medicine, because they produce neurotoxic compounds. In particular, cycads produce a secondary compound, BMAA (S(+)-beta-methyl-alpha, beta-diaminopropionic acid), which has been implicated as the possible cause of Guam's dementia [8]. This disorder occurs among the indigenous Chomorro people, who ate cycads as food, and now suffer from Alzheimer's and Parkinson's dementia [9-11]. BMAA production is unique to cycads, where it has been used as a monophyletic character in plant classification [7]. It is present in both seeds and leaves of all genera of the Cycadaceae [12]. BMAA is neurotoxic in mammals [9,13] because of its excitotoxic action as an agonist of glutamate receptors (GLRs) [14]. The discovery of GLR-like genes in Arabidopsis suggests that plant-derived GLR agonists, as well as acting as potential deterrents to herbivores, might also operate in signaling during plant growth and development, by interacting with native plant GLRs [15]. In partial support of this hypothesis, BMAA was shown to affect the development of Arabidopsis and consequently was used in a pharmacologically-based genetic screen to isolate mutants in a putative GLR pathway in Arabidopsis [16]. Despite the importance of cycads in the study of plant evolution, and their role in neurological disorders in humans, nothing is known about the genes responsible for these traits - primarily because cycads are recalcitrant to genetic analysis. Unlike genetically tractable plants such as tomato, maize and Arabidopsis, cycads are dioecious (male and female organs on separate plants), produce a limited number of seeds and take up to 30 years to become reproductive. Furthermore, cycad genomes are large (20,000-30,000 million base-pairs (Mbp)) [17,18] compared to Arabidopsis (125 Mbp) [19]. Consequently, cycads have remained outside the realm of both traditional genetic studies and modern genome-sequencing initiatives. Fortunately, recent advances in plant genomics [20,21], provide new tools to study genetically complex species such as cycads. In particular, the availability of the complete, annotated sequence of two angiosperm genomes - the dicot Arabidopsis thaliana [19,22] and the monocot rice (Oryza sativa) [23,24] - now makes it possible to study the genomes of evolutionarily important plants by comparing the expressed genes of cycads (ESTs) to the complete genomes of higher plants. To begin a survey of expressed genes of cycads, the genus Cycas was chosen for expressed sequence tag (EST) analysis because Cycas is at the basal node - that is, the sister taxon to the rest of the Cycadales [25-27]. Furthermore, the species Cycas rumphii Miq. was selected for this analysis as it is suspected to be the dietary cause of Guam's dementia. It has been established that in C. rumphii, from which the EST library was made, BMAA levels are nearly 0.1 mg/g tissue [28]. Because of its evolutionary position as a key node within the plant kingdom, as well as its medicinal significance to humans, Cycas is ideally suited for genomic prospecting [29]. Here, we describe the construction of a cycad EST database from RNA of young C. rumphii leaves. Using this database, our comparison revealed conserved genes, including those involved in development and signaling in present-day flowering plants. Our analysis defined a set of cycad clones that have no similarity to any known angiosperm genes, but possess similarity only to genes of other gymnosperms. Furthermore, as a first step to understanding the function of neurotoxins produced in cycads, we defined a number of candidate genes that encode putative enzymes involved in the biosynthesis of BMAA, as well as a cycad GLR-like gene, the suspected target of BMAA action in animal brains. These cDNA tools will be useful to test whether BMAA, which has been postulated to serve as an herbivore deterrent [5], also acts to regulate GLR function in plants. ResultsConstruction of a cDNA library from Cycas rumphiiAt maturity, C. rumphii leaves can reach up to 3 meters in length (Figure 1a). The tissue used in this study consisted of 10 to 40 cm of the immature leaf terminus protruding from the crown collected shortly after emergence (Figure 1b). Immature leaves consist of a petiole, a central rachis and circinate leaflets composed of both expanding and meristematic cells [30]. RNA extracted from this tissue was used to construct a cDNA library from C. rumphii. Size fractionation was used to enrich for full-length cDNAs during library construction. It was determined that 53% of the cDNA clones were over 500 bp long. From this cDNA library, 4,210 sequence reads (ESTs) were generated. The majority of these reads (3,917) were generated from the 5' end of the cDNA; however, a small subgroup (293) were sequenced from the 3' end. Cluster analysis performed at the Munich Information Center for Protein Sequences (MIPS) of the entire EST dataset produced a UniGene set of 2,458 contigs consisting of 1,917 singletons and 541 assemblies. Of the clustered ESTs, the longest contig was 1,836 bp. The entire UniGene set can be viewed on the MIPS Sputnik website [31], which features sequence annotations and peptide sequence predictions. At the MIPS Sputnik site there are links to download the complete cycad sequences as an EST fasta file, a cluster fasta file or as the derived peptide fasta file. Classification of C. rumphii ESTs by functional categoriesEach contig from the database was automatically assigned to a functional category on the basis of its top match against the complete genomic sequence of Saccharomyces cerevisiae and A. thaliana databases using BLASTP. A non-stringent expect value (E-value) of <1e-10 was chosen as the threshold. The pie chart in Figure 3 illustrates the relative fraction that each functional category comprises within the entire UniGene set. The four largest predominant categories of cycad ESTs according to this functional categorization are: 'cellular organization' (22%), 'metabolism' (10%), 'unclassified proteins' (10%), and 'cell growth, cell division/DNA synthesis' (9%).
Cycad contig matches to genes in angiosperms, gymnosperms and lower plantsUsing TBLASTX, a comparison was made between the C. rumphii UniGene set versus all available ESTs from GenBank and predicted Arabidopsis genes from The Arabidopsis Information Resource (TAIR). Both EST and predicted genes were grouped into three subcategories: angiosperms, gymnosperms, and lower plants. The angiosperm database encompasses all annotated rice and Arabidopsis genes identified from their respective genomic sequences, as well as all higher plant ESTs. The gymnosperm database contains ESTs from all gymnosperms, the majority of which came from the Pinus taeda EST sequencing project [32,33]. The lower plant databases included genes from all remaining plant ESTs including ferns, fern allies, bryophytes and algae available in GenBank. The angiosperm subgroup consisted of 84.5%, the gymnosperms 6.5% and lower plants 9.0% of the total genes used in this analysis. The Venn diagram shown in Figure 4 displays the total number of cycad contigs shared between one or more of the plant gene datasets at very low BLAST stringency values (expect < 1e-5). The majority of cycad contigs (1,764/2,458) have counterparts in other plants, leaving 694 with no match to other plant genes. As one would expect, most Cycas hits (1,718) are to angiosperms, because of the predominance of angiosperm accessions in GenBank. Many of the cycad matches to angiosperms also match gymnosperms and/or lower plants (1,416). There are 1,310 cycad contigs that match gymnosperm genes and 734 that match genes from lower plants.
Full-length sequencing of cycad clones that match only gymnosperm genesAs shown in Figure 4, 44 Cycas ESTs specifically match only genes in the gymnosperm subgroup. Two additional Cycas ESTs match genes from gymnosperms and lower plants, but not angiosperms. To further analyze these 46 contigs that match only gymnosperms and/or lower plants, we next sequenced these Cycas cDNAs in their entirety to determine whether this 'gymnosperm/lower plant' specific grouping held up when the remaining portions of the cDNA were sequenced. Because ESTs, even when clustered into contigs, usually represent only a portion of the actual gene (particularly for genes poorly represented in the library) 37 of the 46 Cycas cDNAs were sequenced in their entirety (the remaining nine clones were not successfully recovered for sequencing), and this sequence can be downloaded from the Internet [34]. Of these 37 fully sequenced cDNAs, 14 clones still showed no similarity to any known angiosperm genes, even at this low stringency cut-off. The insert size for each clone ranges from 586 bp to 1,899 bp, with predicted open reading frames (ORFs) varying from 69 to 527 residues (Table 1). None of these 14 Cycas cDNA clones is homologous to any known genes outside the plant kingdom, although Interpro analysis identified a small number of conserved motifs, which are listed in Table 1. To confirm that these genes were indeed derived from C. rumphii, gene-specific primers designed to each of the 14 genes were able to amplify a fragment from genomic DNA isolated from a different C. rumphii specimen and different tissue (sporophyll) from the source tissue of the cDNA library (data not shown). This distinct C. rumphii specimen was cultivated in a geographically separate location (Florida) from the cDNA source C. rumphii specimen used for cDNA library construction (New York). Table 1. Fully sequenced cycad clones from contigs that match only genes in gymnosperms Cycad genes similar to developmental regulatorsA survey of the cycad EST dataset reveals a surprisingly large number of genes with highest similarity (BLASTP score < e-5) to genes with defined roles in growth and development in angiosperms (Table 2). Some of these Cycas genes have similarity to Arabidopsis transcription factors, including CONSTANS [35,36], two distinct homeobox genes [37] and a YABBY gene [38,39]. Other cycad ESTs have similarity to other regulators of Arabidopsis development, including ARGONAUT [40] and COP9 [41,42]. Table 2. Genes in Cycas rumphii with potential roles in signaling, development and biosynthesis of BMAA Cycas genes with similarity to Arabidopsis genes involved in signalingA number of genes in our cycad EST library showed similarity to components of signaling pathways found in higher plants (Table 2). These genes include a photolyase blue-light receptor, genes involved in secondary signaling (including those for calmodulin, kinases, and phosphatases), a 14-3-3 protein, and genes involved in phytohormonal responses, including auxin (IAA-9 and IAA-13) pathways as reviewed in Chory and Wu [43]. Surprisingly, a Cycas EST with high similarity to plant GLR-like genes was also found (Table 2) [15,44]. The presence of a GLR-like gene in cycads is of particular interest as it relates to BMAA, as described below. A predicted pathway for BMAA synthesis in Cycas is supported by EST analysisBMAA, an agonist of mammalian GLRs, is a suspect causative agent of neurological disorders [9,13]. However, nothing is known about the genes and enzymes involved in the biosynthesis of BMAA. Because the structure of BMAA is similar to other beta-substituted alanines [45,46], it is likely that BMAA biosynthesis utilizes phosophoserine, cysteine, o-acetylserine or cyanoalanine as a beginning substrate. On this basis, a likely BMAA biosynthetic pathway is shown in Figure 5. This would require a two-step reaction initiated with the transfer of NH3 at the beta-carbon of the substituted alanine (Figure 5a), followed by an addition of CH3 (Figure 5b) to produce BMAA (Figure 5c). NH3 transfer would require a nucleophilic reaction catalyzed by a cysteine synthase-like protein. A preliminary survey of genes in the cycad EST library identified candidate genes for both of these enzymatic steps (Table 2). The cycad leaf EST library contains two ESTs, which each encode a cysteine synthase. To catalyze the second step of BMAA synthesis, the EST library contains two potential methyltransferases (caffeic acid O-methyltransferase II and caffeoyl-CoA 3-O-methyltransferase). The second step would require a methyl donor, the most likely candidate being S-adenosylmethionine (SAdM). Consumption of SAdM would require the presence of enzymes to regenerate SAdM. A number of cycad ESTs can be implicated in SAdM recycling including: adenosylhomocysteinase, S-adenosylmethionine synthetase and homocysteine methyltransferase. Taken together, the cycad EST library contains candidate genes for all of the enzymes predicted to be present during the biosynthesis of BMAA.
DiscussionCycads can be regarded as living fossilsExtant genera, such as Cycas, have changed little in morphology from their extinct relatives, such as Crossozamia, which existed during the Permian [1,2]. The study of cycads has proved to be useful in reconstructing plant evolution, in particular in understanding the rise of important plant structural innovations such as the evolution of seeds [47]. Cycads also produce a variety of neuroactive compounds, some of which are suspected to be the source of Guam's dementia [11,48]. However, despite their scientific importance in plant biology and medicine, virtually nothing is known regarding gene expression, development and signaling in the Cycadales. As a first step in this direction, a cDNA library was made from young, developing C. rumphii leaves to produce a cycad EST database. A cycad EST database: a foundation to study the evolution of early seed plantsOne advantage of a genomics approach is that it provides rapid access to genes important for evolutionary studies. The more traditional homology-based gene-cloning approach is limited by tedious gene-by-gene purification. It is also limited in that it may miss related genes if the degeneracy is too great or if nonconserved regions of the protein are chosen during primer design. Finally, the targeted gene approach can never be used to discover new genes. Sequence analysis of contigs with BLAST similarity to gymnosperms but not angiospermsAn EST project in Pinus taeda (loblolly pine) sampled 59,797 transcripts from wood-forming tissues [32]. In this analysis, 66 P. taeda contigs showed BLAST similarity at low stringency only to other gymnosperms. Similarly, in our analysis, we found 46 cycad contigs that only matched gymnosperms (including P. taeda) and/or lower plant ESTs, but were not found in the genomes of higher plants or non-plants. Complete sequencing of 37 of these cycad cDNA clones showed that 14 clones, ranging in length from 586 to 1,899 bp, were still found only in other gymnosperms. Having no homology to the completely sequenced genomes of two different angiosperm species - Arabidopsis [19] (a dicot) and rice [23,24] (a monocot) - suggests that these 14 genes are found only in gymnosperms or lower plants, in which genomic studies have only just begun. However, because ESTs as well as contigs usually represent only a portion of the full-length gene sequence, these results are preliminary. For instance, in P. taeda, larger contigs have a higher BLAST match rate to other plant genes then do shorter contigs [32]. Thus, these preliminary results of clade specificity are tenuous and presumably will change as more ESTs, as well as full-length gene sequences, from cycads and other species are generated in the future. Genes with potential developmental roles in cycadsAs in higher plants, cycad leaves are derived from the shoot apical meristem (SAM) [30]. In Cycas leaflet primordia, meristematic growth ceases at the apex, while proceeding basipetally where it becomes localized to the leaflet margins [30]. The presence of these marginal meristems may explain why a surprising number of developmental genes were identified in a relatively small number of ESTs from young cycad leaves (Table 2). A gene with identity to the YABBY gene family was among the cycad ESTs. YABBY genes encode transcription factors expressed on the abaxial side of all lateral organs that promote abaxial cell fate [38]. In Arabidopsis, mutations in the YABBY gene INO (INNER-NO-OUTER), lead to the loss of the outer integument [49] reminiscent of gymnosperm (and cycad) unitegmy (the presence of a single integument). Unitegmy is considered to be the ancestral condition in seed plants [5,47]. An analysis of YABBY gene expression in cycads may help to explain the origin of the integument in gymnosperms, and/or possibly the second integument in angiosperms. One cycad EST from the library has highest similarity to COP9. COP9 encodes a subunit of the COP9 signalosome complex, which controls multiple signaling pathways that regulate development in all eukaryotes [42,50]. In Arabidopsis, the cop9 mutant is constitutively photomorphogenic in dark-grown seedlings [51]. Some gymnosperms, (in particular the Coniferales) are constitutively photomorphogenic when grown in the dark [52,53]. As yet, the phenotype of dark-grown cycad seedlings has not been fully evaluated. The discovery of a gene encoding a putative subunit of the COP9 complex in cycads could be a first step to define the ancestral, developmental role of the signalosome in gymnosperms, particularly with regard to its role in photomorphogenesis. Another gene potentially involved in cycad development has highest similarity to the CONSTANS gene family, which are regulators of flowering time that follow internal and external (environmental) inputs in Arabidopsis [35]. Because cycads predate the evolution of flowers, it would be of interest to determine if CONSTANS genes in cycads temporally regulate sporophyll and cone induction, which typically follows a yearly cycle [5,6]. A cycad GLR-like gene expressed in tissue producing the GLR agonist BMAAAn unexpected finding of the Arabidopsis EST genome project was the discovery of GLR-like genes, or 'neural' receptor genes, in plants [15]. In Arabidopsis, the GLR-like gene family comprises 20 members [54]. Pharmacological evidence has linked Arabidopsis GLRs to light and/or growth signaling pathways [15,16]. Supplying exogenous BMAA to growing Arabidopsis seedlings was shown to block light-induced hypocotyl shortening and cotyledon expansion [16]. Because BMAA has such profound effects on Arabidopsis development, we have previously proposed that BMAA, or glutamate, the natural agonist of GLRs in humans, plays a physiological role in Arabidopsis [15,16]. Continuing genetic studies in Arabidopsis aim to identify the endogenous components of the BMAA-targeted pathway in plants [16]. Cycads produce BMAA [8,9]. One EST uncovered in the C. rumphii leaf cDNA library has a high degree of similarity to plant GLR genes (Table 2). This discovery is intriguing, because it suggests that BMAA might be interacting with native GLR gene products in cycads. To further investigate the relationship between cycad GLR genes and BMAA, we sought to identify cycad genes potentially involved in BMAA synthesis. From the structure of BMAA, we hypothesized that cycads produce BMAA in a simple two-step pathway, beginning with a β-substituted alanine. To enhance the probability of finding genes involved in BMAA synthesis, we made our cDNA library from tissues that produce relatively large quantities of BMAA (nearly 0.1 mg/g tissue) [28]. According to Ohlrogge and Benning, there is a 95% chance of finding the gene for a specified enzyme when it is expressed at 0.1% mRNA/protein by sampling only 3,000 ESTs from an unnormalized library [55]. Considering the prevalence of BMAA in Cycas, it is not surprising that we discovered cognate genes for the predicted enzymes for this BMAA biosynthetic pathway in the cycad EST database (Figure 5, Table 2). Future biochemical and molecular studies will determine if these genes play a part in BMAA synthesis. The discovery of GLR-like genes in C. rumphii raises the intriguing possibility that endogenous BMAA may interact with native cycad GLRs as a regulatory molecule. Future studies aim to understand the role of GLRs in plants, as well as the role of BMAA in herbivore defense versus endogenous signaling. The production of additional ESTs from cycads will increase the variety of genes available for study, so that a detailed expression profile can be evaluated during cycad development. Complementation studies of these genes in orthologous Arabidopsis mutations will help define their roles in cycads. This combined approach to studying cycad gene structure and function will help reveal molecular changes in genes involved in signaling, metabolic and developmental pathways that led to the rise of the seed plants. Materials and methodsTissue collection and library construction and DNA purificationNewly emerged immature leaves from the crown of a C. rumphii tree, accession 808/59 A, were collected from the New York Botanical Garden Conservatory. Leaves collected ranged from 5 to 30 cm in length. Tissue was frozen in liquid nitrogen. RNA was extracted from pulverized, frozen tissue in a mortar and pestle with the RNeasy maxi kit (Qiagen, Valencia, CA) according to the manufacturer's protocol. Purified Cycas RNA was precipitated in 2 M LiCl, washed twice with 70% ethanol, and resuspended in 50 μl water. Poly(A) RNA was subsequently purified from total RNA with the Oligotex Maxi kit (Qiagen). A cDNA library was constructed using the Lambda ZAP-CMV cDNA synthesis kit (Stratagene, La Jolla, CA) using 10 μg poly(A) RNA. Before cloning, cDNA was size fractionated over a Sepharose CL-6b column. The first five fractions containing a total of around 100 ng cDNA were collected, pooled and precipitated in 70% ethanol/0.3 M sodium acetate and resuspended in 3.5 μl water. cDNA (0.5 μl) was then directionally subcloned into the vector at the EcoRI and XhoI sites. DNA was collected from unemerged C. rumphii sporophylls using the DNeasy purification kit (Qiagen). EST sequencingPlasmid DNA was collected as described in the manual (Stratagene) catalog number 200450 in the in vivo mass excision section. Sequence analysis was performed at Cold Spring Harbor Laboratory using an ABI 3700 capillary sequencer (Applied Biosystems, Foster City, CA) for separation and nucleotide detection. Reactions were performed using a 1/16 Big Dye Terminator. Sequencing was performed with either the -21 M13 forward and/or reverse primer. EST clustering and assignment into functional categoriesThe EST sequences were clustered and assembled using the HarvESTer application (Biomax informatics, Martinsried, Germany). The default HarvESTer settings were optimized to screen for vector against the UniVec nonredundant database of vector and polylinker sequences [56]. The Hashed Position Tree (HPT) clustering used a similarity link threshold of 0.7 and a maximum distance of six steps was required to define a cluster from the similarity network, thus encouraging the separation of likely paralogs. Cluster consensus sequences and concomitant alignments were derived from the HPT clusters using the CAP3 application with default settings. The HarvESTer assemblies and coordinate alignments were imported into the Sputnik EST and cluster analysis application [57]. Peptide extractionBLASTX [58] was performed against a nonredundant protein database for each of the cluster consensus sequences. Likely coding sequences were derived for each cluster consensus sequence by parsing the best BLASTX match and filtering the results using the arbitrary expect value <1e-10. Dicodon usage frequencies and probabilities were extracted using tools from the ESTate package [59]. A peptide sequence was predicted for each of the cluster consensus sequences using the Framefinder application from the ESTate package with the cycad-specific codon usage statistics. Framefinder was run using the default parameters. The derived peptide sequences were used as the basic scaffold for peptide-based annotation in Sputnik. Sequence annotationSequence annotation on each of the cycad cluster consensus sequences and derived peptides were performed within the Sputnik application. Results were assessed for possible contamination by searching for homology to the Escherichia coli and human genomes and were scored for homology to a wide range of noncoding RNAs and plant chloroplast and mitochondrial genomes. Similarity searches were performed using the BLAST application [58] and results were filtered using the expectation value < 1e-10. Functional assignment was performed on both cluster consensus sequence and the peptide sequence. Assignments were made using BLASTX and BLASTP respectively against the MIPS catalog of functionally assigned proteins (funcat) [60,61]: tentative functional assignments were filtered using the expectation value < 1e-10. Categorization of cycad contigAll cycad contigs sequences were aligned against the PlantEST database using TblastX [58] and BlastX against the NR(aa) database. The PlantEST database was created by downloading all plant ESTs in GenBank and assembling them using Phrap [60,61]. Todd Wood from Clemson University provided the PERL script that creates the PlantEST databases as described above. The NR(aa) database is a nonredundant database of protein sequences from GenBank. Determination of gymnosperm-specific genesAll available plant ESTs were downloaded from GenBank and separated into three datasets consisting of angiosperms (monocots and dicots), gymnosperms, or lower plants (ferns, mosses and algae). Downloaded ESTs were assembled using Phrap [60,61]. All matches with an expect value < 1e-5 were considered significant. AcknowledgementsWe thank Francesco Coelho, Javier Francisco Ortega and the Montgomery Botanical Center, Florida for providing plant tissue; Dan Chamovitz and Trevor Stokes for reviewing the manuscript; Vivekanand Balija and Neilay Dedhia for sequence generation and curation; Eduardo de la Torre and Eugene Mueller for helpful discussions; and Alex Clark and Ayelet Levy for technical help. Funding for this work comes from the Plant Genomics Consortium. The Plant Genomics Consortium is made possible by the generosity of the Altria Group, The Mary Flagler Cary Charitable Trust, The Eppley Foundation for Research, The Leon Lowenstein Foundation, The Ambrose Monell Foundation, The Wallace Genetic Foundation and the National Institutes of Health, grant number GM-32877 to G.C. and an NIH postdoctoral fellowship to E.B. References
Have something to say? Post a comment on this article! |


on Google Scholar






author email
corresponding author email
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.