Abstract
Background
Dosage imbalance is responsible for several genetic diseases, among which Down syndrome is caused by the trisomy of human chromosome 21.
Results
To elucidate the extent to which the dosage imbalance of specific human chromosome 21 genes perturb distinct molecular pathways, we developed the first mouse embryonic stem (ES) cell bank of human chromosome 21 genes. The human chromosome 21-mouse ES cell bank includes, in triplicate clones, 32 human chromosome 21 genes, which can be overexpressed in an inducible manner. Each clone was transcriptionally profiled in inducing versus non-inducing conditions. Analysis of the transcriptional response yielded results that were consistent with the perturbed gene's known function. Comparison between mouse ES cells containing the whole human chromosome 21 (trisomic mouse ES cells) and mouse ES cells overexpressing single human chromosome 21 genes allowed us to evaluate the contribution of single genes to the trisomic mouse ES cell transcriptome. In addition, for the clones overexpressing the Runx1 gene, we compared the transcriptome changes with the corresponding protein changes by mass spectroscopy analysis.
Conclusions
We determined that only a subset of genes produces a strong transcriptional response when overexpressed in mouse ES cells and that this effect can be predicted taking into account the basal gene expression level and the protein secondary structure. We showed that the human chromosome 21-mouse ES cell bank is an important resource, which may be instrumental towards a better understanding of Down syndrome and other human aneuploidy disorders.
Background
Aneuploidy refers to an abnormal copy number of genomic elements, and is one of the most common causes of morbidity and mortality in humans [1,2]. The importance of aneuploidy is often neglected because most of its effects occur during embryonic and fetal development [3]. Initially, the term aneuploidy was restricted to the presence of supernumerary copies of whole chromosomes, or absence of chromosomes, but this definition has been extended to include deletions or duplications of sub-chromosomal regions [4,5]. Gene dosage imbalance represents the main factor in determining the molecular pathogenesis of aneuploidy disorders [6].
Our interest is focused on the elucidation of the molecular basis of gene dosage imbalance in one of the most clinically relevant and common forms of aneuploidy, Down syndrome (DS). DS, caused by the trisomy of human chromosome 21 (HSA21), is a complex condition characterized by several phenotypic features [6], some of which are present in all patients while others occur only in a fraction of affected individuals. In particular, cognitive impairment, craniofacial dysmorphology and hypotonia are the features present in all DS patients. On the other hand, congenital heart defects occur in only approximately 40% of patients. Moreover, duodenal stenosis/atresia, Hirschsprung disease and acute megakaryocytic leukemia occur 250-, 30- and 300-times more frequently, respectively, in patients with DS than in the general population. Individuals with DS are affected by these phenotypes to a variable extent, implying that many phenotypic features of DS result from quantitative differences in the expression of HSA21 genes. Understanding the mechanisms by which the extra copy of HSA21 leads to the complex and variable phenotypes observed in DS patients [7,8] is a key challenge.
The DS phenotype is clearly the outcome of the extra copy of HSA21. However, this view does not completely address the mechanisms by which the phenotype arises. Korbel et al. [9] provided the highest resolution DS phenotype map to date and identified distinct genomic regions that likely contribute to the manifestation of eight DS features. Recent studies suggest that the effect of the elevated expression of particular HSA21 genes is responsible for specific aspects of the DS phenotype. Arron et al. [10] showed that some characteristics of the DS phenotype can be related to an increase in dosage expression of two HSA21 genes, namely those encoding the transcriptional activator DSCR1-RCAN1 and the protein kinase DYRK1A. These two proteins act synergistically to prevent nuclear occupancy of nuclear factor of activated T cells, namely cytoplasmic, calcineurin-dependent 1 (NFATc) transcription factors, which are regulators of vertebrate development. Recently, Baek et al. showed that the increase in dosage of these two proteins is sufficient to confer significant suppression of tumour growth in Ts65Dn mice [11], and that such resistance is a consequence of a deficit in tumour angiogenesis arising from suppression of the calcineurin pathway [12]. Overexpression of a number of HSA21 genes, including Dyrk1a, Synj1 and Sim2, results in learning and memory defects in mouse models, suggesting that trisomy of these genes may contribute to learning disability in DS patients [13-15].
Many phenotypic features of DS are determined very early in development, when the tissue specification is not completely established [3]. Early postnatal development of both human patients and DS mouse models showed the reduced capability of neuronal precursor cells to correctly generate fully differentiated neurons [16], contributing to the specific cognitive and developmental deficits seen in individuals with DS [17]. Canzonetta et al. [18] showed that DYRK1A-REST perturbation has the potential to significantly contribute to the development of defects in neuron number and altered morphology in DS. The premature reduction in REST levels could skew cell-fate decisions to give rise to a relative depletion in the number of neuronal progenitors.
The exact nature of these events and the role played by increased dosage of individual HSA21 genes remain unknown. To contribute to answering these questions, we have established a cell bank consisting of mouse embryonic stem (mES) cell clones capable of the inducible overexpression of each one of 32 selected genes, 29 murine orthologs of HSA21 genes and 3 HSA21 coding sequences, under the control of the tetracycline-response element (tetO). These genes include thirteen transcription factors, one transcriptional activator, six protein kinases and twelve proteins with diverse molecular functions. By transcriptome and proteome analysis, we determined that these clones, which are able to differentiate in different cell lineages, can be used to unveil the pathways in which these genes are involved. We believe that this resource represents a valuable tool to analyse the genetic pathways perturbed by the dosage imbalance of HSA21 genes.
Results
Validation of an inducible/exchangeable system for generation of transgenic mES cells
In order to generate a library of mES transgenic lines of selected HSA21 genes, we used the ROSA-TET system. This integrates the inducible expression of the Tet-off system, the endogenous and ubiquitous expression from the ROSA26 locus, and the convenience of transgene exchange provided by the recombination-mediated cassette exchange (RMCE) system [19]. Briefly, coding sequences are cloned into an expression vector, driven by an inducible promoter (Tet-off), which can be easily integrated into the ROSA26 locus through a cassette exchange reaction.
Understanding the expression kinetics of the system was essential to standardizing the generation of the mES library encoding the HSA21 genes. Towards this goal, we first tested the system by introducing the luciferase (Luc) gene, cloned into an exchange vector. This enabled accurate quantification of cassette exchange and gene inducibility, at both the RNA and protein level. To this end, we prepared an exchange vector (pPTHC-Luc), which was introduced into the EBRTcH3 ES cell line (EB3), carrying a yellow fluorescent protein (YFP) gene integrated in the ROSA26 locus. After the RMCE procedure, positive exchanged clones were identified by PCR (Additional file 1a) and their inducibility verified using both reporter genes. Quantitative PCR (q-PCR) analysis of Luc expression showed that the system was activated upon the removal of Tetracycline (Tc) from the medium. In the presence of Tc (0 hours; see Materials and methods), Luc mRNA was undetectable, indicating that the background expression level was almost zero, whereas a strong signal was detected 15 hours after Tc withdrawal, and still sustained over a time window of 48 hours (Additional file 1b). We then compared the mRNA level with the enzymatic activity of the protein Luc. To this end, we prepared the protein extracts of the Luc-inducible mES clones at the same time points to quantify luminescence. In agreement with the mRNA data, the enzymatic activity was undetectable in the presence of Tc, whereas a strong signal was measurable 15 hours after Tc withdrawal, indicating a correct induction of Luc translation (Additional file 1b).
Additional file 1. Identification and validation of inducible/exchangeable recombinant mES clones. (a) Recombinant mES clones were identified by PCR analysis. (b) q-PCR analysis and Luciferase assays using Dual Luciferase Reporter Assay System was performed on mES clones overexpressing the firefly luciferase (Luc) gene. The system was activated upon the removal of Tc (after 17, 24, 39 and 48 hours) from the medium. Protein extracts of mES cells were prepared at the same time points and luminescence quantified. (c) q-PCR analysis and YFP fluorescence assay to detect the expression of the YFP reporter. (d) Expression of mES cells overexpressing Luc after 24 hours from the complete removal of Tc from the medium; the degree of induction was easily manipulated by titrating the Tc. (e) Expression profile (q-PCR) of the pluripotency gene Oct3/4, and of markers of the mesoderm (Brachyury), ectoderm (Gfap) and endoderm (Afp) during differentiation of EB3 and of the parent cell line (E14).
Format: PDF Size: 145KB Download file
This file can be viewed with: Adobe Acrobat Reader
We next verified the expression of the YFP reporter gene, which is separated from the Luc gene in the recombinant locus by an IRES sequence, and we detected a comparable level of YFP expression and protein accumulation following induction. The maximal expression of the reporter gene was observed 24 hours after complete removal of Tc from the medium (Additional file 1c).
The level of gene expression can be regulated by adjusting the concentration of Tc in the culture media. Using a ten-fold dilution of Tc, negligible expression of the YFP gene was seen (Additional file 1d), while further dilution of Tc revealed increasing expression levels of YFP.
We then verified the growth properties of this mES line (EB3) compared to the parental line (E14) (data not shown) and the ability of these cells to differentiate along the three germ layers. The EB3 cells displayed the expected transcript down-regulation of the pluripotency gene Oct3/4, and a marked increase of the mesoderm-specific marker Brachyury, of the ectoderm-specific marker Gfap and the endoderm-specific marker Afp during mES differentiation (Additional file 1e).
Collectively these data suggest that, in mES cells, this system allows the efficient and long-term overexpression of the transgene in a dose- and time-dependent manner. It is therefore suitable for systematic expression of HSA21 cDNAs.
Cell bank: the HSA21 gene collection in mES cells
HSA21 is syntenic to three different mouse chromosomal regions located on chromosomes 10, 16 and 17. These three regions contain 175 murine orthologs of protein coding HSA21 genes according to [20].
For the generation of mES clones with inducible overexpression, we selected a subset of 32 genes, 29 of which are murine orthologs of HSA21 genes, and 3 of which are human coding sequences (see also Materials and methods). The 32 genes encode 13 transcription factors (Aire, Bach1, Erg, Ets2, Gabpa, Nrip1, Olig1, Olig2, Pknox1, Runx1, Sim2, ZFP295, 1810007M14Rik), a single transcriptional activator (Dscr1-Rcan1), 6 protein kinases (DYRK1A, SNF1LK, Hunk, Pdxk, Pfkl, Ripk4) and 12 proteins with diverse molecular functions (Atp5j, Atp5o, Cct8, Cstb, Dnmt3l, Gart, Dscr2-Psmg1, Morc3, Mrpl39, Pttg1ip, Rrp1, Sod1) (refer to Additional file 2 for more general information about these genes).
Additional file 2. List of 32 genes overexpressed in mouse ES cells. In this table we list the 32 genes selected to be integrated in the Rosa26 locus and overexpressed using the Tet-off system in mES cells.
Format: DOC Size: 93KB Download file
This file can be viewed with: Microsoft Word Viewer
For a subset of the selected genes, there is evidence for the presence of different alternatively spliced isoforms that may differ in their coding sequence. In such cases, we overexpressed the longest annotated coding sequence. For one transcription factor (ZFP295) and two protein kinases (DYRK1A, SNF1LK), we used the human coding sequences (see also Materials and methods). A schematic representation of our experimental strategy is shown in Figure 1.
Figure 1. Schematic representation of the experimental strategy used. A set of 32 genes, 29 murine orthologs of HSA21 genes and 3 human coding sequences,
were cloned into the pPthC vector [19] and nucleofected along with a pCAGGS-Cre recombinase vector [41] into EBRTcH3 (EB3) cells. Puromycin-resistant clones were isolated and grown in medium
deprived of tetracycline for varying periods of time to perform a time course of induction.
The inducibility of selected clones was evaluated by q-PCR. Global transcriptome and
proteome analysis was performed by hybridization onto an Affymetrix gene chip and
by large-gel two-dimensional gel electrophoresis (2DGE), respectively, to delineate
the consequences of gene dosage imbalance on a single gene basis. WB, western blot.
In order to generate the mES library overexpressing a subset of HSA21 ORFs, we employed the ROSA-TET system, as previously described. The expression construct contained the 3xFLAG epitope at the carboxyl terminus, thus enabling monitoring of transgene protein product. We constructed exchange vectors carrying each of the 32 ORFs and then nucleofected the plasmids into the RMCE recipient mES lines to generate stable clones (see Materials and methods). For each gene, an average of 20 drug-resistant clones were picked, amplified and characterized by PCR analysis.
Three positive clones for each gene were grown in medium deprived of Tc for varying periods of time to verify the sensitivity of each mES line to Tc by performing a time course experiment to identify the capacity of each transgene to be overexpressed. In total we analyzed 96 clones (3 biological replicates for 32 transgenes). As shown in Additional file 3, we performed a time course experiment, at four different time points (17, 24, 39 and 48 hours), for 16 genes: 3 transcription factors (Aire, Sim2 and ZFP295), a protein kinase gene (Hunk) and for all the 12 genes encoding proteins with diverse molecular functions (Atp5j, Atp5o, Cct8, Cstb, Dnmt3l, Gart, Dscr2-Psmg1, Morc3, Mrpl39, Pttg1ip, Rrp1, Sod1). Since the majority of the genes analyzed showed the highest level of induction after 24 hours of Tc deprivation, we decided to test the inducibility of the remaining clones at one time point only. As shown in Additional file 3, we tested 12 clones at one time point: the transcription factors Bach1, Erg, Ets2, Gabpa, Nrip1, Olig1, Pknox1, Runx1, 1810007M14Rik), the transcriptional activator Dscr1-Rcan1 and the protein kinases Pdxk and Pfkl. Finally, one transcription factor (Olig2) and three protein kinases (DYRK1A, SNF1LK and Ripk4) were tested at three different time points (17, 24, and 39 hours). As a control, total RNA extracted from uninduced clones (in the presence of Tc, 0 hours) was used.
Additional file 3. Time course of induction of three clones (biological replicates) selected for each gene. In this table we report the time course of the induction of mES clones that overexpress the 32 ORFs. For each gene, three drug-resistant mES biological replicates, whose names are indicated in the specific column, were selected to be tested for their sensitivity to Tc removal from the medium.
Format: DOC Size: 251KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 2 shows the average induction, evaluated by q-PCR (Additional file 4) and expressed as relative expression (2-dCt), of the 13 transcription factors together with the single transcriptional activator (Figure 2a), the 6 kinases (Figure 2b), and the 12 genes with diverse molecular functions (Figure 2c). For the 13 transcription factors and the transcriptional activator (Figure 2a) and the 6 kinases (Figure 2b) we assessed the potential leakiness of the inducible system in our mES clones. To this aim, we compared the basal expression level of each gene in the parental cell line (EB3) with the expression level in the corresponding transgenic inducible clones (in the biological replicates) grown in the presence of Tc in the medium (0 hours of induction). Results are shown in Figure 2a,b and in Additional file 5. We verified that only in the case of Pdxk is there a statistically significant (corrected P-value false discovery rate (FDR) = 0.04), albeit mild, leakiness.
Figure 2. Average induction of the 32 inducible clones by q-PCR. Baseline expression (0 hours of induction - white bars), following induction of
transgene (after 24 to 48 hours of growth in medium deprived of Tc - gray bars), and
relative expression in the parental cell line (EB3 - black bars). (a) The 13 transcription factors and the single transcriptional activator (Dscr1-Rcan1); (b) the 6 kinases; (c) the other 12 genes with diverse molecular functions. Asterisks indicate statistically
significant expression changes (t-test with false discovery rate <0.05). The errors bars are calculated on the biological
triplicates.
Additional file 4. Primer pairs used in q-PCR.
Format: DOC Size: 128KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 5. Comparison of relative expression levels for 20 genes in the EB3 parental cell line and in the inducible clones at 0 hours of induction by multiple statistical t-tests. In this table we show the comparison of the relative expression of 20 genes (the 13 transcription factors, the single transcriptional activator and the 6 kinases) in the EB3 cell line versus the corresponding transgenic inducible clones (in the biological replicates) grown in the presence of Tc (0 hours of induction).
Format: DOC Size: 123KB Download file
This file can be viewed with: Microsoft Word Viewer
We then checked for the proper ploidy of the clones following extensive passages in culture. To this end, we performed a karyotype assay (Materials and methods) on parental ES cells (EB3) and on 20 different inducible clones of our mES cell bank (representing the 7 effective and the 13 silent genes). All these clones turned out to display a normal karyotype (40 chromosomes).
Transcriptome analysis of mES cell lines
In order to identify the effects of the overexpression of a single gene on the mES transcriptome, we performed Affymetrix Gene-Chip (Mouse 430_2) hybridization experiments for a set of clones overexpressing 20 of the 32 genes (that is, the transcription factors and protein kinases). As we used biological triplicate clones for each gene, this analysis was performed on a total of 60 clones. Total RNA was extracted from each clone at the time-point of maximal expression (Additional file 3), following Tc removal from the medium (Materials and methods). As a control, total RNA extracted from un-induced clones was also used. This procedure resulted in a total of 120 hybridization experiments (the whole set of results is available in the Gene Expression Omnibus database [GEO:GSE19836]).
In order to identify downstream transcriptional effects of the 20 overexpressed genes, microarray data were analyzed to detect differentially expressed genes (that is, in induced versus non-induced cells). We first normalized together both induced and non-induced hybridizations, and then detected differentially expressed genes using a Bayesian t-test method (Cyber-t) followed by FDR correction (threshold FDR < 5%). The overexpression of 7 out of 20 genes perturbed the mES transcriptome in a statistically significant manner: we will refer to these seven genes as the 'effective' genes, as opposed to the other 13, 'silent' genes. In Additional files 6, 7, 8, 9, 10, 11 and 12, we report complete lists of differentially expressed genes following the overexpression of each of the effective genes.
Additional file 6. Complete list of differentially expressed genes following the overexpression of Aire, one of the effective genes.
Format: XLS Size: 865KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 7. Complete list of differentially expressed genes following the overexpression of Erg, one of the effective genes.
Format: XLS Size: 2.2MB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 8. Complete list of differentially expressed genes following the overexpression of Nrip1, one of the effective genes.
Format: XLS Size: 243KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 9. Complete list of differentially expressed genes following the overexpression of Olig2, one of the effective genes.
Format: XLS Size: 612KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 10. Complete list of differentially expressed genes following the overexpression of Pdxk, one of the effective genes.
Format: XLS Size: 181KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 11. Complete list of differentially expressed genes following the overexpression of Runx1, one of the effective genes.
Format: XLS Size: 1.7MB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 12. Complete list of differentially expressed genes following the overexpression of Sim2, one of the effective genes.
Format: XLS Size: 426KB Download file
This file can be viewed with: Microsoft Excel Viewer
The effective genes consisted of six transcription factors (Runx1, Erg, Nrip1, Sim2, Olig2 and Aire) and one kinase (Pdxk). Differential expression was also validated by q-PCR, selecting a subset of the most up-regulated and down-regulated genes (Additional file 13). In order to identify possible biological processes in which the effective genes are involved, we performed a Gene Ontology (GO) enrichment analysis on the lists of differentially expressed genes. We used the DAVID online tool [21-23], restricting the output to biological process terms of levels 4 and 5, with a significance threshold of FDR < 5% and fold enrichment ≥ 1.5%. In Table 1 we report the subsets of significant GO terms for six (Runx1, Erg, Nrip1, Olig2, Pdxk and Aire) out of the seven effective genes that were in agreement with their known function, as suggested by evidence in the literature. A complete list of all significantly enriched GO terms for the seven effective genes is reported in Additional file 14.
Additional file 13. Summary of q-PCR validation of microarray data. In this table we show the validation by q-PCR of the differential expression of a subset of the most up-regulated and down-regulated genes detected by microarray analysis of the seven effective genes, as ranked by differential expression ratio.
Format: DOC Size: 191KB Download file
This file can be viewed with: Microsoft Word Viewer
Table 1. Gene Ontology enrichment analysis for six out of seven effective genes whose overexpression perturbed the mES transcriptome in a statistically significant manner
Additional file 14. A complete list of significantly enriched GO terms for the seven effective genes.
Format: DOC Size: 156KB Download file
This file can be viewed with: Microsoft Word Viewer
High basal expression level of HSA21 genes in mES cells correlates with a lack of transcriptional response following their overexpression
A possible explanation for the lack of a strong transcriptional response following the overexpression of the silent genes could be that they failed in their disturbance of mES cell homeostasis because of a rapid degradation of the synthesized protein. To test this hypothesis, we grew three clones for each effective and for each silent gene in medium deprived of Tc for 24 hours or 48 hours to induce the expression of their protein products. Our expression construct contains the epitope 3xFLAG at the carboxyl terminus of each gene, which allows the detection of the expression of each corresponding protein product by western blotting. A significant protein band was visible on the western blot for all the genes tested, thus leading us to reject this hypothesis.
An alternative hypothesis is that these genes have a high basal expression level in mES cells, and therefore their overexpression will result in only a weak effect on the mES transcriptome. In order to verify this hypothesis, we estimated, using all the 120 microarray experiments, the average expression level of each gene, and its corresponding standard deviation. We reasoned that, due to the large number of arrays, the average expression level for each gene can be considered as a reliable estimate of its basal level of expression in mES cell. In Additional file 15 and in Figure 3a we rank HSA21 genes according to their average expression level, from the most to the least expressed. We highlight in red the 13 silent genes and in blue the 7 effective genes. It is evident that the effective genes show a different distribution from the silent genes: the silent genes tend to be highly endogenously expressed in mES cells, whereas the effective genes tend to be expressed at lower levels. A gene set enrichment analysis (GSEA) [24] was performed to compute the significance of this different distribution (see Materials and methods); this produced a significant enrichment score of 0.402 (FDR q-value = 0). This observation supports the hypothesis that the lack of a strong transcriptional response following the overexpression of some of the HSA21 genes is due to a high basal expression level of these genes.
Additional file 15. List of all the mouse orthologs of HSA21 genes sorted according to their basal expression level in mES cells (from the most to the least expressed).
Format: DOC Size: 211KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 3. The basal expression level and dosage sensitivity of HSA21 genes in mES cells. The effective genes are highlighted in blue, and the silent genes in red. (a) Selected HSA21 genes sorted according to their average expression level in mES cells,
from the most (gene rank = 1) to the least expressed. (b) Selected HSA21 genes sorted according to the total length of the 'disordered' region
of the encoded protein (measured with the GlobPlot tool).
Dosage sensitivity of HSA21 genes in mES cells
We further investigated the cause of the lack of a strong transcriptional response in the silent gene set in order to predict which genes are most sensitive to dosage. A recent study has shown a strong correlation between the sensitivity to increased dosage of a gene and the degree of a certain property of the encoded protein, called intrinsic disorder [25]. The protein disorder is defined as the total number of amino acids included in unstructured regions of the protein. These regions usually contain short sequence motifs (such as localization signals, or nuclear import/export signal), leading to a higher sensitivity to protein dosage [25]. We thus measured protein disorder for both silent and effective genes, excluding the clones in which the human coding sequences were introduced (ZFP295, DYRK1A, SNF1LK) from this analysis because of the possible confounding effect represented by their non-murine origin. In Figure 3b, the silent and effective genes are clearly segregated according to their average level of protein disorder (separation of means verified with t-test, P-value = 0.043). The segregation is almost perfect (with a threshold value for the protein disorder equal to 180) with the only exception being Pdxk, which is an effective gene despite its low disorder value of 26. We attribute this anomaly to the fact that Pdxk is a kinase (the only one in the effective gene list), and its function might place it at the crossroads of a number of crucial pathways.
Comparison with the transcriptional response of the transchromosomic Tc1 mouse line
To demonstrate the potential value of our cell bank in elucidating the transcriptional changes underlying trisomy 21, we compared the output of our overexpression experiments with the transcriptional profile obtained on the 'transchromosomic' Tc1 mouse line [26]. The Tc1 ES cells carry an extra copy of HSA21 and they represent a reference model of trisomy 21 for which publicly accessible transcriptional data in ES cells are available, enabling a direct comparison with our cell bank overexpression experiments. As reported in [26] the Tc1 line is missing some portions of HSA21; however, we verified that all of our 'effective' genes were included, based on the published chromosome map. We have verified that the seven 'effective' genes are all included in the extra chromosome present in the Tc1 line.
Figure 4 shows a scatter plot of the differential expression values following the overexpression of the cell bank genes compared to the differential expression values of genes in the Tc1 ES cell line. We included in this analysis all of the genes that were significantly differentially expressed in both Tc1 and at least one of the seven 'effective' cell bank overexpression experiments. Of all the points in the graph, the ones with the same sign coordinates (both positive or both negative x, y values) represent genes whose transcriptional up- or down-regulation, observed in at least one of the overexpression experiments, is concordant with the transcriptional changes in the Tc1 cells versus control. A statistically significant 125 out of a total of 168 points fall in same-sign quadrants (P < 1e-6). We also separately compared each of the seven overexpression experiments with Tc1 ES cells (Additional file 16); five out of seven effective genes had a statistically significant number of genes with same sign fold-change as in Tc1 cells (Runx1, Erg, Nrip1, Sim2, Aire; Additional file 17). These observations suggest that the transcriptional features of trisomic Tc1 cells can be partially explained as an additive effect of single gene overexpression, thus highlighting the usefulness of our cell bank in elucidating DS.
Figure 4. Comparison of differentially expressed genes following single gene over-expression
in our cell bank mouse ES cell lines versus transchromosomic Tc1 mouse ES cell lines. The colors indicate the overexpression experiment in which the expression value
was found to be significant; for genes whose expression was significant in more than
one overexpression experiment, only the one with the largest absolute value was considered.
A total of 168 points are in the graph, of which 125 fall in same-sign quadrants.
The regression line was forced to pass through the origin in order to highlight the
general trend with respect to zero.
Additional file 16. Summary of results derived from the comparison between the analysis of mES overexpressing effective clones and the transchromosomic Tc1 mouse line.
Format: DOC Size: 483KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 17. Comparison of overexpression experiments with the transcriptional response of the transchromosomic tc1 mouse line. X-Y graphs comparing the transcriptional response of Tc1 with the response obtained in the individual overexpression experiments. Each dot represents a gene whose expression was statistically significant in both the Tc1 and the indicated overexpression experiment. The x axis corresponds to the log of the Tc1 ratio (trisomic versus wild type), and the y axis corresponds to the log of the ratio in the overexpression experiment (induced versus non-induced clone). The ratio of same-sign over total dots is reported for each graph.
Format: PDF Size: 30KB Download file
This file can be viewed with: Adobe Acrobat Reader
Refined analysis of the transcriptional response to the overexpression of silent genes
We verified the possibility to also detect differentially expressed genes in those experiments involving the overexpression of silent genes by using a more sensitive statistical method than the standard t-test approach. The method we selected was Bayesian analysis of variance for microarrays [27-29], a Bayesian spike and slab hierarchical model, as implemented in the BAMarray tool (BAMarray 3.0) [27]. Using this procedure, transcriptional changes were detected in all silent gene overexpression experiments, despite the low fold change of differentially expressed genes, which therefore could include more false positives than the standard t-test.
In order to identify possible biological processes in which the silent genes are involved, we performed the GO enrichment analysis on the list of newly identified differentially expressed genes. In Additional file 18 we report all the significantly enriched GO terms for 11 out of 13 silent genes (for the remaining two silent genes, Ets2 and 1810007M14Rik, no significant GO terms were found). In Additional file 19 we report the subset of significant GO terms for 5 (Bach1, Dscr1-Rcan1, DYRK1A, Gabpa and SNF1LK) out of 13 silent genes, which are in agreement with the known functions of these genes, as determined by evaluation of the literature.
Additional file 18. A complete list of GO terms significantly enriched in the subsets of genes differentially expressed after overexpression of 11 out of 13 silent genes.
Format: DOC Size: 73KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 19. GO enrichment analysis for five (Bach1, Dscr1-Rcan1, DYRK1A, Gabpa and SNF1LK) out of thirteen silent genes, as assessed by microarray analysis. In this table we report the GO enrichment analysis for five out of thirteen silent genes (Bach1, Dscr1-Rcan1, DYRK1A, Gabpa, SNF1LK); supporting references for a subset of significant biological processes identified by the GO analysis are given.
Format: DOC Size: 59KB Download file
This file can be viewed with: Microsoft Word Viewer
Proteome analysis in mES cells overexpressing the Runx1 gene
In order to assess whether the overexpression of single genes in mES causes changes in the proteome comparable to those detected by microarray hybridization experiments, we performed a full proteomic analysis following overexpression of the transcription factor Runx1. This involved high resolution large-gel two-dimensional electrophoresis (2DGE) followed by protein identification performed with database-assisted mass spectrometry. The peak of response at the proteomic level, as assessed by a pilot 2DGE assay on a single Runx1-overexpressing clone (E6), was observed at 48 hours after depletion of Tc, rather than at 24 hours as observed at the transcriptome level for this gene, suggesting a delayed effect due to the fact that protein synthesis occurs subsequent to that of mRNA. We therefore decided to perform the analysis on two Runx1-overexpressing clones (E6 and E7; Additional file 3) by comparing the 2DGE results obtained from the non-induced state (that is, cells grown in the presence of Tc) with those derived from cells grown in a medium deprived of Tc for 48 hours (in other words, cells overexpressing the protein Runx1). For each of the two Runx1-overexpressing clones, three technical replicates were then generated (see Materials and methods). Our 2DGE image data have now been submitted to the World-2DPAGE Repository of the ExPASy Proteomics Server [2DPAGE:0021] [30] for public access [31].
The induction of Runx1 changes the expression of at least 54 proteins (Additional file 20). Of these, 24 were consistently down-regulated while 30 were up-regulated after 48 hours of induction of the protein Runx1. The effect of Runx1 overexpression on the proteome was compared with the effect on the transcriptome, as detected by microarray.
Additional file 20. Differential protein expression variation in mES cells overexpressing Runx1. In this table we report the complete list of the proteins whose expression changed following the induction of Runx1
Format: DOC Size: 142KB Download file
This file can be viewed with: Microsoft Word Viewer
In Table 2, we compare changes in protein levels 48 hours after induction of Runx1 to changes in mRNA levels 24 hours after induction of Runx1. There is a substantial overlap (15 out of 17 affected gene/protein pairs showing similar trends of expression variations) between microarray data and data obtained from the 2DGE assay: 6 out of 24 down-regulated proteins and 9 out of 31 up-regulated proteins displayed similar trends in the corresponding transcripts by microarray analysis. Only two gene/protein pairs, apoE and Sept1, showed opposite behavior in the protein versus microarray assays. Both proteins showed up-regulation, while their mRNA levels showed down-regulation, which suggests that the mRNAs of these two genes might be unstable, leading to longer half-lives of the proteins.
Table 2. Correlation between differential protein expression by 2DGE (protein ratio) and differential gene expression by microarray (mRNA ratio)





