<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
<ui>gb-2012-13-10-r94</ui>
<ji>1465-6906</ji>
<fm>
<dochead>Research</dochead>
<bibl>
<title><p>Contribution of the epigenetic mark H3K27me3 to functional divergence after whole genome duplication in <it>Arabidopsis</it></p></title>
<aug>
<au ca="yes" id="A1"><snm>Berke</snm><fnm>Lidija</fnm><insr iid="I1"/><email>l.berke@uu.nl</email></au>
<au id="A2"><snm>Sanchez-Perez</snm><mi>F</mi><fnm>Gabino</fnm><insr iid="I1"/><insr iid="I2"/><insr iid="I3"/><email>gabino.sanchez.perez@gmail.com</email></au>
<au id="A3"><snm>Snel</snm><fnm>Berend</fnm><insr iid="I1"/><email>b.snel@uu.nl</email></au>
</aug>
<insg>
<ins id="I1"><p>Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, Netherlands</p></ins>
<ins id="I2"><p>Netherlands Consortium for Systems Biology, Science Park 904, 1098 XH Amsterdam, The Netherlands</p></ins>
<ins id="I3"><p>Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, Netherlands</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2012</pubdate>
<volume>13</volume>
<issue>10</issue>
<fpage>R94</fpage>
<url>http://genomebiology.com/2012/13/10/R94</url>
<xrefbib><pubidlist><pubid idtype="pmpid">23034476</pubid><pubid idtype="doi">10.1186/gb-2012-13-10-r94</pubid></pubidlist></xrefbib></bibl>
<history><rec><date><day>20</day><month>8</month><year>2012</year></date></rec><revrec><date><day>10</day><month>9</month><year>2012</year></date></revrec><acc><date><day>3</day><month>10</month><year>2012</year></date></acc><pub><date><day>3</day><month>10</month><year>2012</year></date></pub></history>
<cpyrt><year>2012</year><collab>Berke et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec><st><p>Abstract</p></st>
<sec><st><p>Background</p></st>
<p>Following gene duplication, retained paralogs undergo functional divergence, which is reflected in changes in DNA sequence and expression patterns. The extent of divergence is influenced by several factors, including protein function. We examine whether an epigenetic modification, trimethylation of histone H3 at lysine 27 (H3K27me3), could be a factor in the evolution of expression patterns after gene duplication. Whereas in animals this repressive mark for transcription is deposited on long regions of DNA, in plants its localization is gene-specific. Because of this and a well-annotated recent whole-genome duplication, <it>Arabidopsis thaliana </it>is uniquely suited for studying the potential association of H3K27me3 with the evolutionary fate of genes.</p>
</sec>
<sec><st><p>Results</p></st>
<p>Paralogous pairs with H3K27me3 show the highest coding sequence divergence, which can be explained by their low expression levels. Interestingly, they also show the highest similarity in expression patterns and upstream regulatory regions, while paralogous pairs where only one gene is an H3K27me3 target show the highest divergence in expression patterns and upstream regulatory sequence. These trends in divergence of expression and upstream regions are especially pronounced for transcription factors.</p>
</sec>
<sec><st><p>Conclusions</p></st>
<p>After duplication, a histone modification can be associated with a particular fate of paralogs: H3K27me3 is linked to lower expression divergence yet higher coding sequence divergence. Our results show that H3K27me3 constrains expression divergence after duplication. Moreover, its association with higher conservation of upstream regions provides a potential mechanism for the conserved H3K27me3 targeting of the paralogs.</p>
</sec>
</sec>
</abs>
</fm>
<meta>
<classifications>
<classification id="30010008" subtype="man_spc_id" type="BMC">Evolution</classification>
<classification id="300100010" subtype="man_spc_id" type="BMC">Genome studies</classification>
<classification id="300100015" subtype="man_spc_id" type="BMC">Model organisms</classification>
<classification id="300100016" subtype="man_spc_id" type="BMC">Molecular biology</classification>
<classification id="300100019" subtype="man_spc_id" type="BMC">Plant biology</classification>
</classifications>
</meta>
<bdy>
<sec><st><p>Background</p></st>
<p>Trimethylation of histone H3 at lysine 27 (H3K27me3) is a histone modification with an important role in regulation of gene expression <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It is generally associated with low expression levels and known as a repressive mark for transcription. Its function is conserved from animals to plants; however, there are several differences between the two kingdoms <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. In animals, H3K27me3 marks long multi-gene regions of DNA while in plants it exhibits gene-specific positioning, starting at promoters and extending to the 3' end of the transcribed region, with a bias towards the 5' end of the gene <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. It is deposited by Polycomb Repressive Complex 2 (PRC2) <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Interestingly, plants have several PRC2 complexes <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> that share some of their target genes while keeping a subset of targets unique for each complex <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. It is not precisely known what directs PRC2 to its target genes in plants <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
<p>Functionally, H3K27me3 does not act as an all-on or all-off switch; instead, its placement is intricately regulated based on tissue type or environmental factors <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, similar to the gene-specific manner of regulation by transcription factors. For example, neighboring H3K27me3 target genes show no correlation in expression <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Genes with this epigenetic mark are functionally enriched for transcription factor activity, and are often involved in important processes in development <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. In plants they are precisely regulated, showing tissue- or developmental stage-specific expression <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
<p>Little is known about the evolutionary processes shaping these expression patterns. In yeast and human, expression divergence between paralogs is correlated with coding sequence divergence <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, which is another measure of functional divergence. In plants, however, explaining expression divergence has proven to be a challenge. In <it>Arabidopsis thaliana</it>, old paralogs have diverged more in their expression patterns than newly duplicated genes, yet there is large variability within both groups <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. It remains unresolved whether or not expression divergence correlates with the rate of coding sequence evolution <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Upstream regulatory sequence divergence is weakly correlated to expression divergence only for tandemly duplicated genes <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Additionally, the rate of expression divergence depends on protein function as well as the size and colinearity of the duplicated region <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, showing that a plethora of factors influence the rate of expression divergence between paralogs, and thereby their function.</p>
<p>In our work, we aim to see if H3K27me3 target genes show different trends in functional divergence after gene duplication than non-target genes. To achieve this we analyzed paralogs from the latest whole-genome duplication (WGD) in <it>A. thaliana</it>. The choice of model is warranted by the gene-specific positioning of H3K27me3 and a well-annotated recent WGD <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B19">19</abbr></abbrgrp>. We determined divergence of coding sequences, upstream regulatory regions, and expression patterns. We show that H3K27me3 correlates with different rates of expression pattern divergence of <it>A. thaliana </it>paralogs. Paralogous pairs that are also H3K27me3 target genes exhibit a slower rate of function evolution as measured by expression pattern and regulatory sequence divergence. Paralogous pairs with only one H3K27me3 target gene, however, exhibit the most divergent expression patterns and regulatory sequences. On the other hand, divergence of coding sequence is the highest for H3K27me3 target paralogous pairs, and the lowest for non-target paralogs. This trend can be explained by expression levels <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>; namely, paralogs with H3K27me3 have lower expression and faster coding sequence evolution. The surprising trend in sequence divergence is especially prominent in transcription factors, the most abundant protein function among the H3K27me3 target genes. We show that, after a WGD, a histone modification is associated with slower divergence of expression patterns.</p>
</sec>
<sec><st><p>Results</p></st>
<sec><st><p>Rate of expression divergence is associated with H3K27me3</p></st>
<p>To examine the correlation of H3K27me3 with the evolutionary fate of genes, we focused on paralogs arising from the most recent (3R or &#945;) <it>A. thaliana </it>WGD. The advantage of limiting the analysis to a single WGD is that the resulting genes are of the same age and that the divergence time is thus equal for all of them, allowing us to simplify the analysis by eliminating time as a variable. Moreover, paralogs from large-scale duplications are more likely to be copied in their entirety, with intact coding and regulatory sequences. Additionally, because it is the most recent WGD, many paralogs are retained and relationships between them are well resolved. We used paralogous pairs as defined by Bowers and colleagues <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, a dataset consisting of 3,817 pairs.</p>
<p>Several genome-wide analyses have reported datasets with H3K27me3 target genes <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>, most of them using entire <it>A. thaliana </it>seedlings despite the tissue-specific nature of the mark. These datasets are therefore information about an <it>'</it>average cell<it>' </it>in a seedling. We use them as a proxy for the entire plant: H3K27me3 is either present at a gene in any of the plant tissues or not present at all, simplifying H3K27me3 to a binary property of a gene.</p>
<p>To obtain a reliable set of target genes, we created a combined dataset consisting of genes reported in at least two out of three independent genome-wide experiments analyzing H3K27me3 localization in <it>A. thaliana </it>seedlings <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>, totaling 6,338 genes (Figure s1 in Additional file <supplr sid="S1">1</supplr>; Additional file <supplr sid="S2">2</supplr>). As we consider H3K27me3 a binary property of a gene and compare pairs of paralogs, there are three possible outcomes resulting in three classes of paralogous pairs. The largest class, with 2,534 pairs, consists of paralogous pairs without H3K27me3, and is named <it>none</it>. In 18% of the cases one of the paralogs in the pair carries H3K27me3; these 652 pairs constitute the class <it>mixed</it>. The smallest class is <it>both</it>, consisting of 448 pairs (12%) (Additional file <supplr sid="S3">3</supplr>).</p>
<suppl id="S1">
<title><p>Additional file 1</p></title>
<text><p><b>Supplementary figures</b>.</p></text>
<file name="gb-2012-13-10-r94-S1.pdf">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S2">
<title><p>Additional file 2</p></title>
<text><p><b>H3K27me3 target genes</b>.</p></text>
<file name="gb-2012-13-10-r94-S2.txt">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S3">
<title><p>Additional file 3</p></title>
<text><p><b>Paralogous pairs used in the analysis, and their properties</b>.</p></text>
<file name="gb-2012-13-10-r94-S3.txt">
   <p>Click here for file</p>
</file>
</suppl>
<p>To determine if there is a relationship between the divergence of expression patterns of paralogs and mark presence, we calculated correlation in expression patterns for the three classes of paralogs. We obtained a number of publicly available microarrays from CORNET <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. As H3K27me3 has been shown to play a role in developmental processes as well as in responses to environmental changes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, the experiments range from various tissue types to different stress responses. The class with the highest expression correlation is <it>both</it>, with a median Pearson correlation coefficient of 0.49 (Figure <figr fid="F1">1a</figr>). It is followed by paralogous pairs without marks (<it>none</it>), with a median of 0.42. The two distributions are significantly different (Kolmogorov-Smirnov two-sided test, <it>P</it>-value 4.52e-5). Pairs in the class <it>mixed </it>show the highest divergence in expression with a distinctly lower median correlation of 0.16. This class is the closest to the random distribution (median 0.00), which was created by randomly combining genes into 10,000 pairs and calculating their expression correlation. <it>Mixed </it>is also significantly different from distributions where genes share the mark status (<it>P</it>-value 1.66e-15 for <it>both</it>, <it>P</it>-value &lt;2.2e-16 for <it>none</it>). Remarkably, target genes of H3K27me3 show a common pattern in expression divergence: paralogs with H3K27me3 maintain more similar expression patterns.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Correlation of expression patterns of paralogous pairs</p></caption><text>
   <p><b>Correlation of expression patterns of paralogous pairs</b>. <b>(a) </b>All paralogous pairs. <b>(b) </b>Paralogous pairs with transcription factor (TF) activity.</p>
</text><graphic file="gb-2012-13-10-r94-1"/></fig>
<p>We next wanted to resolve whether this surprising separation of class distributions is caused by the uneven separation of gene functions between the three classes. For example, transcription factors were reported to be the most enriched gene ontology category among the H3K27me3 target genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, and they are expected to be tightly regulated due to their crucial role in the regulatory network. While transcription factors from the 3R duplication retain more similar expression profiles than genes with other functions regardless of their class (Figure <figr fid="F1">1b</figr>; Figure s2 in Additional file <supplr sid="S1">1</supplr>), transcription factors in the class <it>both </it>(78 paralogous pairs) retain the most similar expression patterns, with a median expression correlation coefficient of 0.65. As in Figure <figr fid="F1">1a</figr>, it is followed by the class <it>none </it>(152 pairs; median 0.48) and the <it>mixed </it>class (44 pairs; median 0.41). Despite the small number of pairs in the distributions, the class <it>both </it>is significantly different from <it>none </it>(Kolmogorov-Smirnov two-sided test, <it>P</it>-value 1.1e-3) and the class <it>mixed </it>(<it>P</it>-value 1.2e-3); however, classes <it>none </it>and <it>mixed </it>are not significantly different to each other (<it>P</it>-value 0.09). Similar to other 3R paralogs, the transcription factor paralogs that are H3K27me3 target genes show more highly correlated expression patterns than the classes <it>none </it>and <it>mixed</it>. Thus, the difference between classes is also evident within a group of proteins with a similar function. Hence, proteins with transcription factor activity are not the main determinant for the trends we observed (Figure s2 in Additional file <supplr sid="S1">1</supplr>).</p>
</sec>
<sec><st><p>Expression levels of H3K27me3 target genes explain coding sequence divergence but not expression divergence</p></st>
<p>Functional divergence of paralogs is not only estimated by analyzing differences in expression patterns, but also by determining differences in coding sequence. A positive relationship between the two measures has been observed in fungi and animals but is likely absent in plants <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. For both reasons, we next wanted to determine if divergence of coding regions also shows separation of the distributions of the three classes, and if so, in what order. For every paralogous pair, we calculated the number of nonsynonymous substitutions per nonsynonymous site (dN). Two distributions are clearly separated (Figure <figr fid="F2">2a</figr>): genes in <it>none </it>tend to undergo the smallest number of synonymous substitutions (median dN 0.14). They are followed by paralogs with H3K27me3 (median dN 0.20). The two distributions are significantly different (Kolmogorov-Smirnov two-sided test, <it>P</it>-value &lt;2.2e-16). <it>Mixed </it>has a median dN of 0.22 and a distribution different from that of <it>none </it>(<it>P</it>-value &lt;2.2e-16) but not <it>both </it>(<it>P</it>-value 0.22). In contrast to expression divergence, where <it>mixed </it>shows the lowest conservation, also <it>both </it>shows low sequence conservation. This trend is also present for synonymous substitutions per synonymous site (dS) distributions, with class <it>both </it>showing the highest dS values (Figure s3 in Additional file <supplr sid="S1">1</supplr>). The opposite trends in coding sequence expression pattern divergence suggest not only lack of correlation between the two as reported previously <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp> but for H3K27me3 target genes additionally a negative relationship between sequence and expression divergence. Sequence divergence cannot, therefore, explain the trends in expression divergence that we observed, and instead seems to be under the influence of different factors.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Coding sequence divergence and gene expression levels</p></caption><text>
   <p><b>Coding sequence divergence and gene expression levels</b>. <b>(a) </b>Distribution of Ka values. <b>(b) </b>Distribution of joint gene expression values for paralogous pairs.</p>
</text><graphic file="gb-2012-13-10-r94-2"/></fig>
<p>A possible factor for the faster sequence divergence of H3K27me3 target genes is their lower expression level compared to non-target genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Expression level has already been shown to be the main determinant of sequence divergence for a range of organisms, including <it>A. thaliana </it><abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Low sequence divergence of highly expressed proteins reflects selection against mistranslation and misfolding of the proteins, as these two outcomes present a high fitness cost for the cell. We thus hypothesized that the lower expression levels of H3K27me3 target genes could explain the trends in coding sequence divergence (Figure <figr fid="F2">2a</figr>). To test this, we summed the expression level of both paralogs in a pair across a number of microarray experiments <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Despite the noise that could be introduced by summing expression levels of two genes for each data point, the three distributions are significantly different (Figure <figr fid="F2">2b</figr>; Kolmogorov-Smirnov two-sided test, <it>P</it>-value &lt;2.2e-16, &lt;2.2e-16, and 5.4e-6 for the comparisons <it>both</it>-<it>none</it>, <it>mixed</it>-<it>none</it>, and <it>mixed</it>-<it>both</it>, respectively). As expected from previous results <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, paralogous pairs with H3K27me3 (class <it>both</it>) indeed have the lowest expression levels, and pairs that belong to <it>none </it>have the highest expression. With <it>mixed </it>placed much closer to <it>both </it>than <it>none</it>, the order of distributions is the same as for coding sequence divergence (Figure <figr fid="F2">2a</figr>). This corroborates the previously postulated link between coding sequence divergence and gene expression levels <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> and explains the sequence divergence in relation to mark status.</p>
<p>There is a possibility that low expression alone might lead to higher co-expression. In this case, the higher co-expression of paralogs in class <it>both </it>would be the result of their low expression. To address this confounding factor, we separated all paralogous pairs (regardless of which class they belong to) into five expression level categories (Figure s4 in Additional file <supplr sid="S1">1</supplr>), each containing 20% of the total number of paralogous pairs. Throughout the expression level categories, the most coexpressed class is <it>both</it>, followed by <it>none </it>and <it>mixed</it>. Furthermore, expression level is positively correlated to expression correlation (Figure s5 in Additional file <supplr sid="S1">1</supplr>); that is to say, lowly expressed genes tend to have low correlation. Thus, low expression is not a confounding factor for our main observation.</p>
<p>As the precise mechanism of H3K27me3 regulation is not known, we do not know whether low expression at a locus is a factor inducing trimethylation of K27 of that locus, or conversely, that low expression is simply the result of H3K27me3, which was directed to the locus by an unknown signal. We have shown that H3K27me3 is associated with a slower rate of expression pattern evolution, but cannot say whether it is also the cause.</p>
</sec>
<sec><st><p>Regulatory sequence divergence of H3K27me3 targets corresponds to divergence in their expression patterns</p></st>
<p>Different regulatory mechanisms come together to shape gene expression patterns; while our focus is epigenetic modifications, transcription factors binding short DNA elements have a more direct effect on transcription. To see if paralogs with H3K27me3, which have more conserved expression patterns, also show more conserved upstream regulatory regions, we compared 500 bp upstream regions of paralogs. We used SharMot <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> to calculate the shared motif divergence score (dSM), which ranges from 0, for identical sequences, to 1, which means no similarity between the two sequences (Additional file <supplr sid="S4">4</supplr>). The dSM score was also calculated for 10,000 randomly combined pairs. We consider dSM values that are more similar than the 5% most similar randomly combined upstream regions (dSM = 0.94; Figure <figr fid="F3">3</figr>) to be indicative of conserved regulatory sites. We used this 5% cutoff to determine the optimal minimal length of the conserved upstream sequences (18 bp), and promoter length (500 bp). Shorter minimal length of conserved upstream sequences and longer promoter dramatically increase the number of false positives (determined by the number of hits in randomly combined pairs) in comparison to the number of all found conserved sequences (determined by number of hits in paralogous pairs).</p>
<suppl id="S4">
<title><p>Additional file 4</p></title>
<text><p><b>Conserved upstream regions</b>.</p></text>
<file name="gb-2012-13-10-r94-S4.zip">
   <p>Click here for file</p>
</file>
</suppl>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Conservation of upstream regulatory regions as measured by dSM</p></caption><text>
   <p><b>Conservation of upstream regulatory regions as measured by dSM</b>. <b>(a) </b>Distribution of dSM scores between all paralogous pairs, according to H3K27me3. The dashed vertical line shows the dSM value at the fifth percentile of the random pairs (0.94). <b>(b) </b>Frequency of paralogous pairs with dSM lower than the fifth percentile cutoff.</p>
</text><graphic file="gb-2012-13-10-r94-3"/></fig>
<p>The most similar upstream regions are those of class <it>both </it>(41% of all pairs), followed by <it>none </it>(26%) and <it>mixed </it>(23%) (Figure <figr fid="F3">3a</figr>). Transcription factors show even higher similarity: 63%, 47% and 45% of pairs, respectively, have significantly similar upstream regions (Figure <figr fid="F3">3b</figr>). The difference between <it>both </it>and <it>mixed</it>, and <it>both </it>and <it>none </it>is statistically significant (two-sample test for equality of proportions with continuity correction; <it>P</it>-values 1.02e-7 and 1.88e-7, respectively). While the difference between transcription factor-only classes is not significant due to the low number of pairs, there is significant difference between all gene and transcription factor classes (<it>P</it>-value 0.0007 for <it>both</it>, 0.0015 for <it>mixed </it>and 4.58e-8 for <it>none</it>).</p>
<p>Notably, the number of conserved upstream regulatory sequences is likely even higher as we report conserved sequences of promoters of 500 bp in length. Freeling and colleagues <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> examined the upstream regions of &#945; WGD paralogs and found a number of genes rich in conserved upstream regions. They are significantly overrepresented in class <it>both </it>(<it>P</it>-value 3.37e-11, hypergeometric test) but not in <it>none </it>or <it>mixed </it>(<it>P</it>-value 1 and 0.56, respectively), in agreement with our findings. Paralogs with H3K27me3 have more conserved upstream regions, followed by <it>none </it>and <it>mixed</it>, which is comparable to the trend in expression pattern divergence, indicating that conserved upstream regions might hold the answer to different levels of expression pattern divergence.</p>
</sec>
</sec>
<sec><st><p>Discussion</p></st>
<p>In <it>A. thaliana</it>, the histone mark H3K27me3 localizes to individual genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, enabling us to follow the changes in each gene separately. Our first observation, higher sequence divergence of H3K27me3 target genes (Figure <figr fid="F1">1</figr>), can be explained by their lower expression levels, a correlation that has been reported previously <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. More importantly, our analysis reveals a relationship between H3K27me3 target genes and conservation of expression patterns (Figure <figr fid="F2">2</figr>). We exclude low expression value as a confounding factor for our observation (Figure s4 in Additional file <supplr sid="S1">1</supplr>).</p>
<p>We aim to uncover an association of H3K27me3 target genes with a particular trend in their evolution, namely lower rate of expression divergence. We measured correlation in expression patterns over numerous different cell or tissue types and treatments to integrate regulatory information over many conditions. The H3K27me3 data were derived from seedlings, and represent a state in an average seedling cell. An average seedling cell is a statistical construct and might represent completely different levels of H3K27me3 in different seedling tissues. We therefore use the gene property <it>'</it>can be marked by H3K27me3<it>' </it>irrespective of the extent to which it is marked in the seedling (the fold-enrichment). This property is binary and allows a simple classification scheme of paralogs to see if they differ in a variety of aspects. In order to obtain a reliable definition of having H3K27me3 or not, we used an integration of datasets, as commonly used in integrative genomics <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, where at least two independent statistically significant calls are required to confirm that a gene is an H3K27me3 target.</p>
<p>Another epigenetic modification, DNA methylation of gene bodies, has been shown to correlate with other gene features in <it>A. thaliana</it>, specifically gene length and number of introns, as well as coding sequence divergence <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Epigenetic mechanisms have also been proposed for other observations, such as preferential deletion of paralogs from one homeolog, after a WGD in <it>A. thaliana </it><abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Our work, however, represents the first time that an association between a histone modification and establishment of expression patterns has been shown.</p>
<p>Based on our observations, we propose the following mechanism. Immediately after the duplication, the selection pressure is relaxed on both paralogs, and they can accumulate mutations and changes in regulation. If both genes keep H3K27me3, their expression patterns are likely to remain similar, possibly due to conserved elements in their upstream regulatory regions. For paralogous pairs without the mark, the expression pattern is mainly the result of transcription factors binding to their binding sites, which in turn also means lower upstream regulatory region conservation. Their expression patterns, though, are less similar than in <it>both </it>because H3K27me3 strongly represses transcription. Class <it>mixed</it>, on the other hand, shows highly divergent expression patterns: the paralog with H3K27me3 expression repression will be regulated by a different set of mechanisms and likely repressed in many tissues, and the resulting expression patterns will differ significantly between the two paralogs.</p>
<p>Paralogs in class <it>mixed </it>are also interesting because they show that H3K27me3 is not evolutionarily inert and that it has been possible to gain or lose the property of having H3K27me3 in the millions of years since the duplication event, and that the parental genomes contributing to the duplication event were necessarily not epigenetically identical (which is likely if the duplication event was an allotetraploidization). In our work, however, we do not aim to reconstruct the ancestral state of H3K27me3 in the parental genome. We analyze current associations between H3K27me3 target genes and their expression levels and correlation to their paralogs. Thus, the possibility that &#945; WGD was an allotetraploidization event does not confound our results.</p>
<p>Due to a relatively long minimal length of the conserved upstream sequences at which we detect the strongest signal (18 bp), these sequences can hardly be attributed to a single transcription factor binding site. However, their function is uncertain: some might be <it>cis</it>-regulatory modules, a cluster of transcription factor binding sites. As hinted by higher conservation of upstream regulatory regions of paralogs in class <it>both</it>, other conserved upstream sequences might even have an H3K27me3-related function, such as RLE, a 50-bp element that has recently been found to be necessary for H3K27me3 deposition on LEC2 [TAIR: AT1G28300] <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. More work will be needed to define the function of the conserved regions.</p>
</sec>
<sec><st><p>Conclusions</p></st>
<p>H3K27me3 has an important role in regulation of gene expression in animals as well as in plants <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. More so than animals, the plant <it>A. thaliana </it>is a uniquely suited model for our study because of gene-specific positioning of H3K27me3 and its recent WGD. We compared paralogs that emerged at the latest <it>A. thaliana </it>WGD and had the same amount of time to diverge. Because H3K27me3 is a tissue-specific epigenetic mark, and therefore not a permanent modification, it is remarkable that we observe such an effect.</p>
<p>Our first observation is that the rate of expression divergence differs between genes from different classes. Paralogs with H3K27me3 retain more similar expression patterns, while paralogous pairs with only one H3K27me3 target gene diverge the most. Paralogs in this class might show a higher divergence rate because H3K27me3 provides an additional and different layer of transcription regulation, together with transcription factors and other mechanisms. The difference in expression pattern divergence is the most pronounced for transcription factors. We show the same trends for conservation of upstream regulatory regions. In addition, pairs with H3K27me3 also show the highest coding sequence divergence, and are followed by class <it>mixed</it>, whereas pairs without H3K27me3 show the highest conservation of coding sequence. This is closely linked to expression levels, as H3K27me3 is a transcriptionally repressive mark and its target genes are expressed at lower levels.</p>
<p>To our knowledge, our work is the first to report an association between a histone modification and gene fate after duplication, and highlights the importance of epigenetics also as a factor in an evolutionary context.</p>
</sec>
<sec><st><p>Materials and methods</p></st>
<sec><st><p>Datasets and general layout</p></st>
<p>We obtained paralogous pairs from the latest (3R, or &#945;) <it>A. thaliana </it>whole-genome duplication <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, and three whole-genome analyses of genes carrying H3K27me3 (Figure s1 in Additional file <supplr sid="S1">1</supplr>) <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. In order to increase the confidence in our combined dataset, we used only genes that appeared in at least two out of three H3K27me3 datasets (6,338 genes in total) as they were obtained using different methods (ChIP-chip, ChIP-seq, and ChIP-chip, respectively) and slightly differing plant material (10 to 14, 10, and 10 days after germination, respectively). Because several tissue types are represented in a seedling, the reported genes with H3K27me3 are a weighted average of the entire plant. As a consequence, we treat H3K27me3 as a binary property of a gene - that is, it is either present in any tissue or cell type, or not present at all.</p>
<p>The paralogous pairs were classified into three classes based on the number of genes in a pair that had H3K27me3: <it>both </it>(448 pairs), <it>mixed </it>(652 pairs), or <it>none </it>(2,534 pairs).</p>
</sec>
<sec><st><p>Coding sequence similarity</p></st>
<p>To calculate coding sequence similarity, protein sequences and coding sequences (genome release version TAIR10) were obtained from TAIR <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. For each paralogous pair we first aligned protein sequences using needle (EMBOSS 6.3.1) <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> (parameters: -gapopen 10.0 -gapextend 0.5) , and then performed protein-guided nucleotide alignment using backtrans from treebest 1.9.2 <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> (parameter: -t 0.5). From the resulting alignment we estimated dN and dS with codeml from PAML package v4.4 <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> using the Nei and Gojobori substitution model and the following parameters: noisy = 0; verbose = 2; runmode = -2; seqtype = 1; model = 0; NSsites = 0 ; icode = 0; fix_alpha = 0; fix_kappa = 0; RateAncestor = 0. Pairs with Ks &gt; 5.0 were discarded because of unreliability of large Ks values, as were pairs with negative Ks values. These anomalies were attributed to changes in genome annotation between TAIR10 and the <it>A. thaliana </it>genome version used in <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The remaining 3,634 paralogous pairs (448 in <it>both</it>, 652 in <it>mixed</it>, 2,534 in <it>none</it>) were used in subsequent analysis.</p>
</sec>
<sec><st><p>Expression</p></st>
<p>Expression correlation was obtained from microarray experiments (annotated as: PO:0009004: gametophyte, PO:0009008: organ, PO:0009002: plant cell, PO:0009008: sporophyte, PO:0009007: tissue, EXT:0000020: abiotic_stress_design, EXT:0000021: biotic_stress_design) from CORNET <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, comprising 2,231 slides (Additional file <supplr sid="S5">5</supplr>). They were normalized in R v2.10.1 using RMA from the affy package. Pearson correlation between two paralogs was calculated using a custom perl script. As ATH1 microarrays do not contain probes for all <it>A. thaliana </it>genes, and we only made use of unique probes (identifiers ending with _at), the number of pairs was reduced to 319 in class <it>both</it>, 451 in <it>mixed</it>, and 1,865 in <it>none</it>. Thus, the percentage of retained pairs was similar in all classes (71%, 69% and 74% of pairs, respectively).</p>
<suppl id="S5">
<title><p>Additional file 5</p></title>
<text><p><b>Microarrays used in the study</b>.</p></text>
<file name="gb-2012-13-10-r94-S5.txt">
   <p>Click here for file</p>
</file>
</suppl>
<p>The random distribution was obtained by randomly selecting 10,000 times two genes from the microarray, and calculating their expression correlation. We considered all genes annotated with the Gene Ontology term 'transcription factor activity' [GO:0003700] to be transcription factors.</p>
<p>For analysis of expression levels, the expression values were summed over all experiments for both genes in a paralogous pair. To calculate the linear regression model (Figure s5 in Additional file <supplr sid="S1">1</supplr>), the Pearson correlation coefficient (r) was transformed using ln ((1 + r)/(1 - r)), as has been described previously <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B13">13</abbr></abbrgrp>.</p>
</sec>
<sec><st><p>Similarity of upstream regions</p></st>
<p>The similarity of 500 bp upstream regulatory sequences of paralogs (downloaded from TAIR <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, genome version TAIR10) was calculated using SharMot <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, parameter -l 18. Parameter -l determines the minimal length of the perfect stretch of matching nucleotides. To obtain a random distribution, we combined randomly selected genes into 10,000 pairs. Comparison with previously reported genes with conserved upstream regions <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> was performed using bigfoot pairs that also appear in Bowers <it>et al</it>.s dataset <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and do not contain <it>'</it>_oa<it>' </it>in their identifiers.</p>
</sec>
</sec>
<sec><st><p>Abbreviations</p></st>
<p>H3K27me3: trimethylation of histone H3 at lysine 27; WGD: whole-genome duplication.</p>
</sec>
<sec><st><p>Authors' contributions</p></st>
<p>LB, GFSP and BS conceived the study and its design, and LB performed all analyses. LB drafted the primary manuscript. All authors contributed to and approved the final manuscript for publication.</p>
</sec>
</bdy>
<bm>
<ack>
<sec><st><p>Acknowledgements</p></st>
<p>We thank Michael F. Seidl for fruitful discussion of the manuscript and two anonymous reviewers for their insightful comments. We are also grateful to ERA-NET project PcG-code and especially Franziska Turck for early access to their H3K27me3 dataset.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>The Polycomb complex PRC2 and its mark in life.</p></title><aug><au><snm>Margueron</snm><fnm>R</fnm></au><au><snm>Reinberg</snm><fnm>D</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>469</volume><fpage>343</fpage><lpage>349</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09784</pubid><pubid idtype="pmpid" link="fulltext">21248841</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Programming of gene expression by Polycomb group proteins.</p></title><aug><au><snm>K&#246;hler</snm><fnm>C</fnm></au><au><snm>Villar</snm><fnm>CBR</fnm></au></aug><source>Trends Cell Biol</source><pubdate>2008</pubdate><volume>18</volume><fpage>236</fpage><lpage>243</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tcb.2008.02.005</pubid><pubid idtype="pmpid" link="fulltext">18375123</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Whole-genome analysis of histone H3 lysine 27 trimethylation in <it>Arabidopsis</it>.</p></title><aug><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Clarenz</snm><fnm>O</fnm></au><au><snm>Cokus</snm><fnm>S</fnm></au><au><snm>Bernatavichute</snm><fnm>YV</fnm></au><au><snm>Pellegrini</snm><fnm>M</fnm></au><au><snm>Goodrich</snm><fnm>J</fnm></au><au><snm>Jacobsen</snm><fnm>SE</fnm></au></aug><source>PLoS Biol</source><pubdate>2007</pubdate><volume>5</volume><fpage>e129</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.0050129</pubid><pubid idtype="pmcid">1852588</pubid><pubid idtype="pmpid" link="fulltext">17439305</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Diversity of Polycomb group complexes in plants: same rules, different players?</p></title><aug><au><snm>Hennig</snm><fnm>L</fnm></au><au><snm>Derkacheva</snm><fnm>M</fnm></au></aug><source>Trends Genet</source><pubdate>2009</pubdate><volume>25</volume><fpage>414</fpage><lpage>423</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tig.2009.07.002</pubid><pubid idtype="pmpid" link="fulltext">19716619</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Polycomb group complexes mediate developmental transitions in plants.</p></title><aug><au><snm>Holec</snm><fnm>S</fnm></au><au><snm>Berger</snm><fnm>F</fnm></au></aug><source>Plant Physiol</source><pubdate>2012</pubdate><volume>158</volume><fpage>35</fpage><lpage>43</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.111.186445</pubid><pubid idtype="pmpid" link="fulltext">22086420</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Different Polycomb group complexes regulate common target genes in <it>Arabidopsis</it>.</p></title><aug><au><snm>Makarevich</snm><fnm>G</fnm></au><au><snm>Leroy</snm><fnm>O</fnm></au><au><snm>Akinci</snm><fnm>U</fnm></au><au><snm>Schubert</snm><fnm>D</fnm></au><au><snm>Clarenz</snm><fnm>O</fnm></au><au><snm>Goodrich</snm><fnm>J</fnm></au><au><snm>Grossniklaus</snm><fnm>U</fnm></au><au><snm>K&#246;hler</snm><fnm>C</fnm></au></aug><source>EMBO Rep</source><pubdate>2006</pubdate><volume>7</volume><fpage>947</fpage><lpage>952</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.embor.7400760</pubid><pubid idtype="pmcid">1559666</pubid><pubid idtype="pmpid" link="fulltext">16878125</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Dynamics of histone H3 lysine 27 trimethylation in plant development.</p></title><aug><au><snm>Zheng</snm><fnm>B</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au></aug><source>Curr Opin Plant Biol</source><pubdate>2011</pubdate><volume>14</volume><fpage>112</fpage><lpage>129</lpage></bibl><bibl id="B8"><title><p>From decision to commitment: the molecular memory of flowering.</p></title><aug><au><snm>Adrian</snm><fnm>J</fnm></au><au><snm>Torti</snm><fnm>S</fnm></au><au><snm>Turck</snm><fnm>F</fnm></au></aug><source>Mol Plant</source><pubdate>2009</pubdate><volume>2</volume><fpage>628</fpage><lpage>642</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/mp/ssp031</pubid><pubid idtype="pmpid" link="fulltext">19825644</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>H3K27me3 profiling of the endosperm implies exclusion of polycomb group protein targeting by DNA methylation.</p></title><aug><au><snm>Weinhofer</snm><fnm>I</fnm></au><au><snm>Hehenberger</snm><fnm>E</fnm></au><au><snm>Roszak</snm><fnm>P</fnm></au><au><snm>Hennig</snm><fnm>L</fnm></au><au><snm>K&#246;hler</snm><fnm>C</fnm></au></aug><source>PLoS Genet</source><pubdate>2010</pubdate><volume>6</volume><fpage>e1001152</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.1001152</pubid><pubid idtype="pmcid">2951372</pubid><pubid idtype="pmpid" link="fulltext">20949070</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Control of developmental regulators by Polycomb in human embryonic stem cells.</p></title><aug><au><snm>Lee</snm><fnm>TI</fnm></au><au><snm>Jenner</snm><fnm>RG</fnm></au><au><snm>Boyer L</snm><fnm>a</fnm></au><au><snm>Guenther</snm><fnm>MG</fnm></au><au><snm>Levine</snm><fnm>SS</fnm></au><au><snm>Kumar</snm><fnm>RM</fnm></au><au><snm>Chevalier</snm><fnm>B</fnm></au><au><snm>Johnstone</snm><fnm>SE</fnm></au><au><snm>Cole</snm><fnm>MF</fnm></au><au><snm>Isono</snm><fnm>K</fnm></au><au><snm>Koseki</snm><fnm>H</fnm></au><au><snm>Fuchikami</snm><fnm>T</fnm></au><au><snm>Abe</snm><fnm>K</fnm></au><au><snm>Murray</snm><fnm>HL</fnm></au><au><snm>Zucker</snm><fnm>JP</fnm></au><au><snm>Yuan</snm><fnm>B</fnm></au><au><snm>Bell</snm><fnm>GW</fnm></au><au><snm>Herbolsheimer</snm><fnm>E</fnm></au><au><snm>Hannett</snm><fnm>NM</fnm></au><au><snm>Sun</snm><fnm>K</fnm></au><au><snm>Odom</snm><fnm>DT</fnm></au><au><snm>Otte</snm><fnm>AP</fnm></au><au><snm>Volkert</snm><fnm>TL</fnm></au><au><snm>Bartel</snm><fnm>DP</fnm></au><au><snm>Melton D</snm><fnm>a</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au><au><snm>Young</snm><fnm>R</fnm></au></aug><source>Cell</source><pubdate>2006</pubdate><volume>125</volume><fpage>301</fpage><lpage>313</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2006.02.043</pubid><pubid idtype="pmpid" link="fulltext">16630818</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Polycomb complexes repress developmental regulators in murine embryonic stem cells.</p></title><aug><au><snm>Boyer L</snm><fnm>a</fnm></au><au><snm>Plath</snm><fnm>K</fnm></au><au><snm>Zeitlinger</snm><fnm>J</fnm></au><au><snm>Brambrink</snm><fnm>T</fnm></au><au><snm>Medeiros L</snm><fnm>a</fnm></au><au><snm>Lee</snm><fnm>TI</fnm></au><au><snm>Levine</snm><fnm>SS</fnm></au><au><snm>Wernig</snm><fnm>M</fnm></au><au><snm>Tajonar</snm><fnm>A</fnm></au><au><snm>Ray</snm><fnm>MK</fnm></au><au><snm>Bell</snm><fnm>GW</fnm></au><au><snm>Otte</snm><fnm>AP</fnm></au><au><snm>Vidal</snm><fnm>M</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Young R</snm><fnm>a</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au></aug><source>Nature</source><pubdate>2006</pubdate><volume>441</volume><fpage>349</fpage><lpage>353</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature04733</pubid><pubid idtype="pmpid" link="fulltext">16625203</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Genome-wide analysis of Polycomb targets in Drosophila melanogaster.</p></title><aug><au><snm>Schwartz</snm><fnm>YB</fnm></au><au><snm>Kahn</snm><fnm>TG</fnm></au><au><snm>Nix D</snm><fnm>a</fnm></au><au><snm>Li</snm><fnm>X-Y</fnm></au><au><snm>Bourgon</snm><fnm>R</fnm></au><au><snm>Biggin</snm><fnm>M</fnm></au><au><snm>Pirrotta</snm><fnm>V</fnm></au></aug><source>Nat Genet</source><pubdate>2006</pubdate><volume>38</volume><fpage>700</fpage><lpage>705</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1817</pubid><pubid idtype="pmpid" link="fulltext">16732288</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Rapid divergence in expression between duplicate genes inferred from microarray data.</p></title><aug><au><snm>Gu</snm><fnm>Z</fnm></au><au><snm>Nicolae</snm><fnm>D</fnm></au><au><snm>Lu</snm><fnm>HH-S</fnm></au><au><snm>Li</snm><fnm>WH</fnm></au></aug><source>Trends Genet</source><pubdate>2002</pubdate><volume>18</volume><fpage>609</fpage><lpage>613</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-9525(02)02837-8</pubid><pubid idtype="pmpid" link="fulltext">12446139</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Divergence in the spatial pattern of gene expression between human duplicate genes.</p></title><aug><au><snm>Makova</snm><fnm>KD</fnm></au><au><snm>Li</snm><fnm>W-H</fnm></au></aug><source>Genome Res</source><pubdate>2003</pubdate><volume>13</volume><fpage>1638</fpage><lpage>1645</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.1133803</pubid><pubid idtype="pmcid">403737</pubid><pubid idtype="pmpid" link="fulltext">12840042</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Functional divergence of duplicated genes formed by polyploidy during <it>Arabidopsis </it>evolution.</p></title><aug><au><snm>Blanc</snm><fnm>G</fnm></au><au><snm>Wolfe</snm><fnm>KH</fnm></au></aug><source>Plant Cell</source><pubdate>2004</pubdate><volume>16</volume><fpage>1679</fpage><lpage>1691</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.021410</pubid><pubid idtype="pmcid">514153</pubid><pubid idtype="pmpid" link="fulltext">15208398</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of <it>Arabidopsis</it>.</p></title><aug><au><snm>Haberer</snm><fnm>G</fnm></au><au><snm>Hindemitt</snm><fnm>T</fnm></au><au><snm>Meyers</snm><fnm>BC</fnm></au><au><snm>Mayer</snm><fnm>KFX</fnm></au></aug><source>Plant Physiol</source><pubdate>2004</pubdate><volume>136</volume><fpage>3009</fpage><lpage>3022</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.104.046466</pubid><pubid idtype="pmcid">523363</pubid><pubid idtype="pmpid" link="fulltext">15489284</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Nonrandom divergence of gene expression following gene and genome duplications in the flowering plant <it>Arabidopsis </it>thaliana.</p></title><aug><au><snm>Casneuf</snm><fnm>T</fnm></au><au><snm>De Bodt</snm><fnm>S</fnm></au><au><snm>Raes</snm><fnm>J</fnm></au><au><snm>Maere</snm><fnm>S</fnm></au><au><snm>Van de Peer</snm><fnm>Y</fnm></au></aug><source>Genome Biol</source><pubdate>2006</pubdate><volume>7</volume><fpage>R13</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2006-7-2-r13</pubid><pubid idtype="pmcid">1431724</pubid><pubid idtype="pmpid" link="fulltext">16507168</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Divergence in expression between duplicated genes in <it>Arabidopsis</it>.</p></title><aug><au><snm>Ganko</snm><fnm>EW</fnm></au><au><snm>Meyers</snm><fnm>BC</fnm></au><au><snm>Vision</snm><fnm>TJ</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2007</pubdate><volume>24</volume><fpage>2298</fpage><lpage>2309</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msm158</pubid><pubid idtype="pmpid" link="fulltext">17670808</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events.</p></title><aug><au><snm>Bowers</snm><fnm>JE</fnm></au><au><snm>Chapman</snm><fnm>BA</fnm></au><au><snm>Rong</snm><fnm>J</fnm></au><au><snm>Paterson</snm><fnm>AH</fnm></au></aug><source>Nature</source><pubdate>2003</pubdate><volume>422</volume><fpage>433</fpage><lpage>438</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature01521</pubid><pubid idtype="pmpid" link="fulltext">12660784</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution.</p></title><aug><au><snm>Drummond</snm><fnm>DA</fnm></au><au><snm>Wilke</snm><fnm>CO</fnm></au></aug><source>Cell</source><pubdate>2008</pubdate><volume>134</volume><fpage>341</fpage><lpage>352</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2008.05.042</pubid><pubid idtype="pmcid">2696314</pubid><pubid idtype="pmpid" link="fulltext">18662548</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Why highly expressed proteins evolve slowly.</p></title><aug><au><snm>Drummond</snm><fnm>DA</fnm></au><au><snm>Bloom</snm><fnm>JD</fnm></au><au><snm>Adami</snm><fnm>C</fnm></au><au><snm>Wilke</snm><fnm>CO</fnm></au><au><snm>Arnold</snm><fnm>FH</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2005</pubdate><volume>102</volume><fpage>14338</fpage><lpage>14343</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0504070102</pubid><pubid idtype="pmcid">1242296</pubid><pubid idtype="pmpid" link="fulltext">16176987</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p><it>Arabidopsis </it>REF6 is a histone H3 lysine 27 demethylase.</p></title><aug><au><snm>Lu</snm><fnm>F</fnm></au><au><snm>Cui</snm><fnm>X</fnm></au><au><snm>Zhang</snm><fnm>S</fnm></au><au><snm>Jenuwein</snm><fnm>T</fnm></au><au><snm>Cao</snm><fnm>X</fnm></au></aug><source>Nat Genet</source><pubdate>2011</pubdate><volume>43</volume><fpage>715</fpage><lpage>719</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.854</pubid><pubid idtype="pmpid" link="fulltext">21642989</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Tissue-specific expression of FLOWERING LOCUS T in <it>Arabidopsis </it>is maintained independently of polycomb group protein repression.</p></title><aug><au><snm>Farrona</snm><fnm>S</fnm></au><au><snm>Thorpe</snm><fnm>FL</fnm></au><au><snm>Engelhorn</snm><fnm>J</fnm></au><au><snm>Adrian</snm><fnm>J</fnm></au><au><snm>Dong</snm><fnm>X</fnm></au><au><snm>Sarid-Krebs</snm><fnm>L</fnm></au><au><snm>Goodrich</snm><fnm>J</fnm></au><au><snm>Turck</snm><fnm>F</fnm></au></aug><source>Plant Cell</source><pubdate>2011</pubdate><volume>23</volume><fpage>3204</fpage><lpage>3214</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.111.087809</pubid><pubid idtype="pmcid">3203448</pubid><pubid idtype="pmpid" link="fulltext">21917549</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>CORNET: a user-friendly tool for data mining and integration.</p></title><aug><au><snm>De Bodt</snm><fnm>S</fnm></au><au><snm>Carvajal</snm><fnm>D</fnm></au><au><snm>Hollunder</snm><fnm>J</fnm></au><au><snm>Van den Cruyce</snm><fnm>J</fnm></au><au><snm>Movahedi</snm><fnm>S</fnm></au><au><snm>Inz&#233;</snm><fnm>D</fnm></au></aug><source>Plant Physiol</source><pubdate>2010</pubdate><volume>152</volume><fpage>1167</fpage><lpage>1179</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.109.147215</pubid><pubid idtype="pmcid">2832254</pubid><pubid idtype="pmpid" link="fulltext">20053712</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Misfolded proteins impose a dosage-dependent fitness cost and trigger a cytosolic unfolded protein response in yeast.</p></title><aug><au><snm>Geiler-Samerotte</snm><fnm>KA</fnm></au><au><snm>Dion</snm><fnm>MF</fnm></au><au><snm>Budnik</snm><fnm>BA</fnm></au><au><snm>Wang</snm><fnm>SM</fnm></au><au><snm>Hartl</snm><fnm>DL</fnm></au><au><snm>Drummond</snm><fnm>DA</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2011</pubdate><volume>108</volume><fpage>680</fpage><lpage>685</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.1017570108</pubid><pubid idtype="pmcid">3021021</pubid><pubid idtype="pmpid" link="fulltext">21187411</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Factors that contribute to variation in evolutionary rate among <it>Arabidopsis </it>genes.</p></title><aug><au><snm>Yang</snm><fnm>L</fnm></au><au><snm>Gaut</snm><fnm>BS</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2011</pubdate><volume>28</volume><fpage>2359</fpage><lpage>2369</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msr058</pubid><pubid idtype="pmpid" link="fulltext">21389272</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Effects of gene expression on molecular evolution in <it>Arabidopsis </it>thaliana and <it>Arabidopsis </it>lyrata.</p></title><aug><au><snm>Wright</snm><fnm>SI</fnm></au><au><snm>Yau</snm><fnm>CBK</fnm></au><au><snm>Looseley</snm><fnm>M</fnm></au><au><snm>Meyers</snm><fnm>BC</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2004</pubdate><volume>21</volume><fpage>1719</fpage><lpage>1726</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msh191</pubid><pubid idtype="pmpid" link="fulltext">15201397</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>A gene expression map of <it>Arabidopsis </it>thaliana development.</p></title><aug><au><snm>Schmid</snm><fnm>M</fnm></au><au><snm>Davison</snm><fnm>TS</fnm></au><au><snm>Henz</snm><fnm>SR</fnm></au><au><snm>Pape</snm><fnm>UJ</fnm></au><au><snm>Demar</snm><fnm>M</fnm></au><au><snm>Vingron</snm><fnm>M</fnm></au><au><snm>Sch&#246;lkopf</snm><fnm>B</fnm></au><au><snm>Weigel</snm><fnm>D</fnm></au><au><snm>Lohmann</snm><fnm>JU</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><fpage>501</fpage><lpage>506</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1543</pubid><pubid idtype="pmpid" link="fulltext">15806101</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>cis-Regulatory and protein evolution in orthologous and duplicate genes.</p></title><aug><au><snm>Castillo-Davis</snm><fnm>CI</fnm></au><au><snm>Hartl</snm><fnm>DL</fnm></au><au><snm>Achaz</snm><fnm>G</fnm></au></aug><source>Genome Res</source><pubdate>2004</pubdate><volume>14</volume><fpage>1530</fpage><lpage>1536</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.2662504</pubid><pubid idtype="pmcid">509261</pubid><pubid idtype="pmpid" link="fulltext">15256508</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>G-boxes, bigfoot genes, and environmental response: characterization of intragenomic conserved noncoding sequences in <it>Arabidopsis</it>.</p></title><aug><au><snm>Freeling</snm><fnm>M</fnm></au><au><snm>Rapaka</snm><fnm>L</fnm></au><au><snm>Lyons</snm><fnm>E</fnm></au><au><snm>Pedersen</snm><fnm>B</fnm></au><au><snm>Thomas</snm><fnm>BC</fnm></au></aug><source>Plant Cell</source><pubdate>2007</pubdate><volume>19</volume><fpage>1441</fpage><lpage>1457</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.107.050419</pubid><pubid idtype="pmcid">1913728</pubid><pubid idtype="pmpid" link="fulltext">17496117</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Comparative assessment of large-scale data sets of protein-protein interactions.</p></title><aug><au><snm>von Mering</snm><fnm>C</fnm></au><au><snm>Krause</snm><fnm>R</fnm></au><au><snm>Snel</snm><fnm>B</fnm></au><au><snm>Cornell</snm><fnm>M</fnm></au><au><snm>Oliver</snm><fnm>SG</fnm></au><au><snm>Fields</snm><fnm>S</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>417</volume><fpage>399</fpage><lpage>403</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12000970</pubid></xrefbib></bibl><bibl id="B32"><title><p>A probabilistic functional network of yeast genes.</p></title><aug><au><snm>Lee</snm><fnm>I</fnm></au><au><snm>Date</snm><fnm>SV</fnm></au><au><snm>Adai</snm><fnm>AT</fnm></au><au><snm>Marcotte</snm><fnm>EM</fnm></au></aug><source>Science</source><pubdate>2004</pubdate><volume>306</volume><fpage>1555</fpage><lpage>1558</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1099511</pubid><pubid idtype="pmpid" link="fulltext">15567862</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Body-methylated genes in <it>Arabidopsis </it>thaliana are functionally important and evolve slowly.</p></title><aug><au><snm>Takuno</snm><fnm>S</fnm></au><au><snm>Gaut</snm><fnm>BS</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2012</pubdate><volume>29</volume><fpage>219</fpage><lpage>227</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msr188</pubid><pubid idtype="pmpid" link="fulltext">21813466</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Following tetraploidy in an <it>Arabidopsis </it>ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes.</p></title><aug><au><snm>Thomas</snm><fnm>BC</fnm></au><au><snm>Pedersen</snm><fnm>B</fnm></au><au><snm>Freeling</snm><fnm>M</fnm></au></aug><source>Genome Res</source><pubdate>2006</pubdate><volume>16</volume><fpage>934</fpage><lpage>946</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.4708406</pubid><pubid idtype="pmcid">1484460</pubid><pubid idtype="pmpid" link="fulltext">16760422</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Transcriptional regulation of <it>Arabidopsis </it>LEAFY COTYLEDON2 involves RLE, a cis-element that regulates trimethylation of histone H3 at lysine-27.</p></title><aug><au><snm>Berger</snm><fnm>N</fnm></au><au><snm>Dubreucq</snm><fnm>B</fnm></au><au><snm>Roudier</snm><fnm>F</fnm></au><au><snm>Dubos</snm><fnm>C</fnm></au><au><snm>Lepiniec</snm><fnm>L</fnm></au></aug><source>Plant Cell Online</source><pubdate>2011</pubdate><volume>23</volume><fpage>4065</fpage><lpage>4078</lpage><xrefbib><pubid idtype="doi">10.1105/tpc.111.087866</pubid></xrefbib></bibl><bibl id="B36"><title><p>TAIR</p></title><url>http://www.arabidopsis.org</url></bibl><bibl id="B37"><title><p>EMBOSS: the European Molecular Biology Open Software Suite.</p></title><aug><au><snm>Rice</snm><fnm>P</fnm></au><au><snm>Longden</snm><fnm>I</fnm></au><au><snm>Bleasby</snm><fnm>A</fnm></au></aug><source>Trends Genet</source><pubdate>2000</pubdate><volume>16</volume><fpage>276</fpage><lpage>277</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid><pubid idtype="pmpid" link="fulltext">10827456</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>TreeSoft:TreeBeST</p></title><url>http://treesoft.sourceforge.net/treebest.shtml</url></bibl><bibl id="B39"><title><p>PAML 4: phylogenetic analysis by maximum likelihood.</p></title><aug><au><snm>Yang</snm><fnm>Z</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2007</pubdate><volume>24</volume><fpage>1586</fpage><lpage>1591</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/molbev/msm088</pubid><pubid idtype="pmpid" link="fulltext">17483113</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm>
</art>