<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2010-11-7-126</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research highlight</dochead>
      <bibl>
         <title>
            <p>How do proteins gain new domains?</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Marsh</snm>
               <mi>A</mi>
               <fnm>Joseph</fnm>
               <insr iid="I1"/>
            </au>
            <au ca="yes" id="A2">
               <snm>Teichmann</snm>
               <mi>A</mi>
               <fnm>Sarah</fnm>
               <insr iid="I1"/>
               <email>sat@mrc-lmb.cam.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, UK</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>7</issue>
         <fpage>126</fpage>
         <url>http://genomebiology.com/2010/11/7/126</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">20630117</pubid><pubid idtype="doi">10.1186/gb-2010-11-7-126</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>15</day>
               <month>7</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>BioMed Central Ltd</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>A study of the contributions of different mechanisms of domain gain in animal proteins suggests that gene fusion is likely to be most frequent.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="30010008" subtype="man_spc_id" type="BMC">Evolution</classification>
         <classification id="30010009" subtype="man_spc_id" type="BMC">Genetics</classification>
         <classification id="300100010" subtype="man_spc_id" type="BMC">Genome studies</classification>
         <classification id="300100016" subtype="man_spc_id" type="BMC">Molecular biology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Research highlight</p>
         </st>
         <p>Domains are evolutionarily conserved regions of proteins with generally independent structural and functional properties. Although only a fairly limited set of domains has been created during evolution, combining these domains in different ways has led to the huge number of observed protein domain architectures. These multidomain proteins have diverse functions that rely on the collective properties of their component domains. Therefore, a key to understanding the evolution of proteins is to understand how multidomain proteins gain, lose and rearrange domains. A considerable body of literature has been dedicated to extrapolating these mechanisms from amino acid sequence and domain architecture information <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. In a study in this issue of <it>Genome Biology</it>, Buljan <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> have addressed the question from a new perspective-by investigating the relative contributions of different molecular genetic mechanisms for domain acquisition to the evolution of animal proteins, inferred from gene structure at the nucleotide level.</p>
         <p>The availability of a large number of fully sequenced genomes in recent years has facilitated significant insight into the evolution of domain architectures in multidomain proteins. The tendency for proteins to exist in multidomain combinations has been found to differ greatly between different branches of the evolutionary tree, with eukaryotes generally having a greater proportion of multidomain proteins <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Animal proteins are particularly interesting, as the creation of multidomain proteins and the rate of domain rearrangements appear to have substantially increased in the recent metazoan lineage <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Different protein-domain families have widely varying propensities to combine with other domains: most will combine with very few other domains, whereas some will form a large number of combinations <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Most evolutionary changes to multidomain protein architectures occur at the amino and carboxyl termini in the form of insertions of new domains, domain repetitions and domain deletions <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Recent modeling at the protein-sequence level suggests that the evolution of most protein-domain architectures can be explained by a series of simple steps, and that complex rearrangements are rare <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Mechanisms for domain acquisition</p>
         </st>
         <p>Proteins can acquire new domains by various mechanisms. Gene fusion, in which two adjacent genes become joined, is a major mechanism for multidomain protein formation in bacteria <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. However, the mechanisms for domain gain in eukaryotes are more varied, primarily because of their complex exon-intron gene structures. Although gene fusion is also important in eukaryotes, it typically does not involve the direct joining of exons from adjacent genes. Instead, splicing patterns are modified so that a fused gene is transcribed from the still separated exons (Figure <figr fid="F1">1a</figr>). Interestingly, the rate of gene fusion appears to be considerably greater than the opposite process, gene fission, in which a single gene splits into two <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Possible mechanisms for the gain of protein domains</p>
            </caption>
            <text>
               <p><b>Possible mechanisms for the gain of protein domains</b>. Colored blocks represent exons, with red, blue and green indicating exons coding for different domains. Solid black lines represent introns and red lines indicate intergenic regions. <b>(a) </b>Gene fusion. The noncoding region between two genes is modified so that the exons of the first gene become spliced with the second. <b>(b) </b>Exon extension. The noncoding region following an exon becomes part of the exon and codes for a new domain. <b>(c) </b>Exon recombination. The exons of two genes become directly joined. <b>(d) </b>Intron recombination. An exon from one gene is inserted into the intron of another. <b>(e) </b>Retroposition. A retrotransposon sequence (RT, purple) mediates the copying of itself and a neighboring gene region via an mRNA intermediate, followed by insertion into another gene.</p>
            </text>
            <graphic file="gb-2010-11-7-126-1" hint_layout="double"/>
         </fig>
         <p>A different mechanism for domain gain involves the extension of an exon into a noncoding region (Figure <figr fid="F1">1b</figr>). One might presume this mechanism to be extremely rare, given that expression of a previously noncoding sequence would seem unlikely to result in a functional polypeptide. Buljan <it>et al</it>. have specifically addressed this mechanism, as we discuss later.</p>
         <p>Other mechanisms for protein domain gain involve recombination. For example, exons from two different genes could be directly joined (Figure <figr fid="F1">1c</figr>). Alternatively, exons from one gene could be inserted into the introns of another (Figure <figr fid="F1">1d</figr>). Intronic recombination is often referred to as exon shuffling, and has been speculated to be one of the main drivers behind the diversity of domain architectures in complex eukaryotes <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. An important role for intron recombination in domain rearrangements is supported by the observations that there are significant correlations between domain boundaries and exon boundaries, and that most of the exons that correspond to domains are surrounded by introns of symmetric phase (that is, introns are inserted at the same positions with respect to codon triplets) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>Retrotransposons are genetic elements that can replicate and insert themselves at other genomic locations. This provides another possible mechanism for protein domain gain, as retrotransposons can also copy regions of genes and insert them into other genes (Figure <figr fid="F1">1e</figr>). Notably, because retroposition occurs via an mRNA intermediate, an inserted region will lack the introns of the gene from which it originated.</p>
      </sec>
      <sec>
         <st>
            <p>Assessing the relative contributions of domain-gain mechanisms</p>
         </st>
         <p>Although the actual physical events behind most domain gains may be more complex than presented in Figure <figr fid="F1">1</figr>, these mechanisms provide a simple framework by which the majority of protein domain gains can be explained. However, despite the recent work on multidomain protein evolution at the amino acid level, there has been little investigation of the extent to which the different molecular genetic mechanisms have contributed to the current diversity of multidomain protein architectures in complex eukaryotes. This is the question that Buljan <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> have set out to address.</p>
         <p>The authors started by compiling a set of putative domain-gain events. These were identified by examining the domain assignments and phylogenetic relationships between genes from a large number of fully sequenced genomes. As previous work has shown that the process of identifying evolutionary changes in domain architectures can be sensitive to erroneous annotations <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, the authors used very stringent criteria in their selection process to ensure that the identified gains were likely to be true domain-gain events and not domain losses or artifacts of the genome or domain annotation procedures. Thus, although this procedure is likely to miss some true gains, the final set, containing 330 high-confidence domain-gain events, should include very few false positives.</p>
         <p>The key to assessing the relative contributions of different domain-gain mechanisms is the fact that different mechanisms should leave distinct genomic traces. For example, a domain gained from retroposition is likely to have only a single exon as the retrotransposon replicates via a transcribed mRNA intermediate. Thus, gained domains containing multiple exons are unlikely to have been acquired via retroposition. Other mechanisms, including gene fusion and exon recombination, are much more likely to occur at protein termini, whereas intron recombination can only occur in the middle of a protein. The location of the gained domain can thereby be used to infer by what mechanisms the domain gain was likely to have occurred. Finally, for all gained domains, the authors searched for homologs within the same genomes to identify potential 'donor' genes. This provides information on whether gene duplication preceded domain gains and can identify potential source genes for retroposition.</p>
         <p>A primary finding of this study <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> was that most domain gains (71% of the total) occurred at the amino or carboxyl termini of proteins, and that most of these gains involved multiple exons. Gene fusion is the only plausible mechanism that can account for these 32% of gains that occur at termini and involve multiple exons. In addition, gene fusion is likely to have caused many of the other 39% of gains that occurred at termini, although, in these cases, other mechanisms cannot be excluded. These results strongly suggest that gene fusion is the most important mechanism for domain gain in animals. Of course, fusion can only occur between genes that are adjacent on the chromosome. The authors found no evidence that any of the fused genes existed separately in adjacent, non-fused forms, and so an additional mechanism would be required to juxtapose the genes before fusion. In at least 80% of domain-gain events, there was evidence for duplication preceding the domain gain of either the donor gene or the gene that acquired the domain. In addition, in cases where a donor gene could be identified in the same genome, it was located on the same chromosome as the domain gain in a significant fraction of these cases. This strongly suggests nonallelic homologous recombination as the likely mechanism for bringing separate genes together, as it favors recombination on the same chromosome.</p>
         <p>Although recombination between introns has been speculated to be one of the main mechanisms behind the diverse domain rearrangements observed in complex eukaryotes <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, it seems to have made a fairly limited contribution to the domain-gain events studied by Buljan <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Only 10% of the gained domains were both internally located and surrounded by introns of symmetric phase, which would make their gain likely to have occurred by intron recombination. Thus, although it has probably played a very important role in the evolution of some multidomain proteins, intron recombination has contributed to far fewer domain gains than has gene fusion.</p>
         <p>Gained domains that were encoded by single exons and for which potential donor genes could be identified are likely candidates for retroposition. Only a few gains fit these criteria, and manual inspection revealed only a single case in which a retrotransposon sequence was present in the donor gene. Thus, the authors <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> suggest that retroposition underlies only a small fraction of domain gains in animal proteins. However, they do note a high percentage of single-exon domain gains in insects, which hints that retroposition may have played different roles in different lineages.</p>
         <p>A very interesting finding from this study relates to the frequency of intrinsically disordered regions in the gained domains. Intrinsically disordered regions of proteins lack stable folded structure, and have recently garnered significant attention because of their numerous important biological functions and their association with various human diseases <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Interestingly, the authors noted that the fraction of residues predicted to be intrinsically disordered was significantly greater in gained domains than in other domains. In particular, those domains encoded by exon extensions showed a dramatic enrichment in disorder. This suggests an origin for these disordered regions from previously noncoding sequences that have become exonized. Thus, this study has important implications for both understanding the origin of intrinsically disordered protein sequences and for helping to explain the preponderance of proteins in complex eukaryotes that possess intrinsically disordered regions. Figure <figr fid="F2">2</figr> shows a hypothetical example of a protein with multiple folded domains gaining an intrinsically disordered region at its carboxyl terminus via an exon extension.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Hypothetical model of a multidomain protein gaining an intrinsically disordered region via a carboxy-terminal exon extension</p>
            </caption>
            <text>
               <p><b>Hypothetical model of a multidomain protein gaining an intrinsically disordered region via a carboxy-terminal exon extension</b>. This protein has three folded domains (based on Protein Data Bank entry <ext-link ext-link-id="1BIB" ext-link-type="pdb">1BIB</ext-link>), colored yellow, blue and red, and a 40-residue disordered extension at its carboxyl terminus, colored green. The folded domains are shown as a surface representation, and the disordered region is shown as an ensemble model with multiple distinct structures representing its substantial conformational heterogeneity.</p>
            </text>
            <graphic file="gb-2010-11-7-126-2" hint_layout="double"/>
         </fig>
         <p>Inferring evolutionary mechanisms from genomic sequences with millions of years of divergence between them is inherently difficult and Buljan <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> have done an admirable job of extracting the available information. However, there is still considerable work to do to improve our understanding of different domain-gain mechanisms. Evolution is complex, and it is likely that a mixture of processes contributed to many domain gains and rearrangements. For example, although gene fusion is likely to be the dominant domain-gain mechanism, the recombination that precedes it relies on regions of sequence similarity that may have originated from retrotransposon activity. Moreover, the methods for classifying domain gains from sequences are imperfect and thus frequencies given for different domain-gain mechanisms can only be considered rough estimates. Nonetheless, this study <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> provides strong support for the idea that most domain gains in animal proteins were directly mediated by gene fusion, preceded by duplication and recombination. Intron recombination and retroposition, on the other hand, appear to have been less important in recent evolutionary history. Because of the tremendous recent advances in next-generation sequencing technologies, the number of fully sequenced genomes will vastly increase in the relatively near future. This will allow the molecular genetic mechanisms of multidomain protein evolution to be studied in much more detail.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>JM is supported by an EMBO Long-Term Fellowship.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Domain combinations in archaeal, eubacterial and eukaryotic proteomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Apic</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>310</volume>
            <fpage>311</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4776</pubid>
                  <pubid idtype="pmpid" link="fulltext">11428892</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Quantification of the elevated rate of domain rearrangements in metazoa.</p>
            </title>
            <aug>
               <au>
                  <snm>Ekman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bj&#246;rklund</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Elofsson</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>372</volume>
            <fpage>1337</fpage>
            <lpage>1348</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2007.06.022</pubid>
                  <pubid idtype="pmpid" link="fulltext">17689563</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Domain deletions and substitutions in the modular protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Weiner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Beaussart</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bornberg-Bauer</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>FEBS J</source>
            <pubdate>2006</pubdate>
            <volume>273</volume>
            <fpage>2037</fpage>
            <lpage>2047</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1742-4658.2006.05220.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16640566</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Domain rearrangements in protein evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Bj&#246;rklund</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Ekman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Light</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Frey-Sk&#246;tt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elofsson</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>353</volume>
            <fpage>911</fpage>
            <lpage>923</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.08.067</pubid>
                  <pubid idtype="pmpid" link="fulltext">16198373</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Modeling the evolution of protein domain architectures using maximum parsimony.</p>
            </title>
            <aug>
               <au>
                  <snm>Fong</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Geer</snm>
                  <fnm>LY</fnm>
               </au>
               <au>
                  <snm>Panchenko</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>366</volume>
            <fpage>307</fpage>
            <lpage>315</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2006.11.017</pubid>
                  <pubid idtype="pmcid">1858635</pubid>
                  <pubid idtype="pmpid" link="fulltext">17166515</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Quantifying the mechanisms of domain gain in animal proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Buljan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frankish</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2010</pubdate>
            <volume>11</volume>
            <fpage>R74</fpage>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Pasek</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Risler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Br&#233;zellec</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>1418</fpage>
            <lpage>1423</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl135</pubid>
                  <pubid idtype="pmpid" link="fulltext">16601004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genome evolution and the evolution of exon-shuffling-a review.</p>
            </title>
            <aug>
               <au>
                  <snm>Patthy</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1999</pubdate>
            <volume>238</volume>
            <fpage>103</fpage>
            <lpage>114</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(99)00228-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">10570989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Protein domains correlate strongly with exons in multiple eukaryotic genomes-evidence of exon shuffling?</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Grigoriev</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>399</fpage>
            <lpage>403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.06.013</pubid>
                  <pubid idtype="pmpid" link="fulltext">15313546</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Intrinsically unstructured proteins and their functions.</p>
            </title>
            <aug>
               <au>
                  <snm>Dyson</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>197</fpage>
            <lpage>208</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm1589</pubid>
                  <pubid idtype="pmpid" link="fulltext">15738986</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>