Additional file 8.

Examples of domain gains by joining of exons from adjacent genes. (a) TreeFam family TF323983 contains Cadherin EGF LAG seven-pass G-type receptor (CESLR) precursor genes. One branch of the family, containing vertebrate genes, has gained the Sulfate transport and STAS domains in addition to the ancestral cadherin, EGF and other extracellular domains. The gain occurred after the other vertebrates diverged from fish and homologues without the gained domains are present in all animals. A representative for the gain is the transcript CELSR3-207 (ENST00000383733) and its 3' end is shown on the left-hand side (the whole transcript is too long to be clearly presented). On the right-hand side is shown a gene that is the plausible donor of these domains. Namely, the gene SLC26A4 (ENSG00000091137) contains both domains, and its STAS domain is 31% identical to that in the CELSR3 gene. In addition, the alignment with the zebrafish genome is shown below the CELSR3-207 transcript. The yellow arrows represent the alignment with chromosome 8 in zebrafish, and pink arrows that with chromosome 6 (information taken from the USCS browser). The alignment with the fish genome shows that the synteny is broken exactly in the region where the new domain is gained. Therefore, the plausible scenario for domain gain involves gene duplication, recombination and joining of newly adjacent exons. (b) Another example of a domain gain after gene duplication and exon joining. Family TF334740 in the TreeFam database contains genes that code for the Rho-guanine nucleotide exchange factor (RhoGEF). However, the RhoGEF domain was not present in the ancestral protein but was inserted later on together with the C1_1 domain when mammals diverged from other vertebrates (TreeFam release 6.0 that we used in the analysis had chicken, fish and frog genes without the gained domains). The representative transcript for the gain event is AC093283.3-201 (ENST00000296794). The gene ARHGEF18 (ENSG00000104880) has both of these domains, and the two RhoGEF domains between the genes are 52% identical. Hence, ARHGEF18 is a plausible donor for this gain event. Again, the mechanism for the gain of these domains most likely involves gene duplication and exon joining. (c) An example of a domain gain after segmental duplication and exon joining. TreeFam family TF351422 contains only primate genes, and after a gene duplication event one branch of the family has gained the PTEN_C2 domain. A representative transcript for this gain is AL354798.13-202 (ENST00000381866). A few segmental duplications span across the gene AL354798.13 and one of them covers only the ancestral portion of the gene - without the gained domain. The pair of that segmental duplication is on the gene's paralogue that has not gained the domain, the gene AP000365.1 (ENSG00000206249). Hence, a possible scenario is that a recent duplication of a paralog gene has changed its genetic environment and brought it into proximity of the PTEN_C2 domain, which subsequently became part of the gene. (d) Another example of a gain of a domain-coding region by segmental duplication followed by exon joining. A branch with primate genes in the TF340491 family of vertebrate proteins that contain the KRAB domain has gained the additional HATPase_c domain. The representative transcript is the human PMS2L3-202 (ENST00000275580). The HATPase_c domain exists in the gene PMS2 (ENSG00000122512) and on the protein level the gained domain is 98% identical to the sequence in the protein product of PMS2's transcript, PMS2-001. A segmental duplication spans across the gained sequence in the transcript PMS2L3-202 and is a pair of the segmental duplication that covers the same domain in the gene PMS2. The pair of segmental duplication regions are presented as grey boxes and are connected with arrows. Therefore, the mechanism underlying this gain appears to be a segmental duplication of the sequence belonging to PMS2 after which the copy next to PMS2L3-202's ancestor was joined with it. An important caveat is that PMS2L3-202 has a structure that can be targeted by NMD.

Format: EPS Size: 1.2MB Download file

Buljan et al. Genome Biology 2010 11:R74   doi:10.1186/gb-2010-11-7-r74