<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2006-7-4-r33</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Mutation patterns of amino acid tandem repeats in the human proteome</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Mularoni</snm>
					<fnm>Loris</fnm>
					<insr iid="I1"/>
					<email>lmularoni@imim.es</email>
				</au>
				<au id="A2">
					<snm>Guig&#243;</snm>
					<fnm>Roderic</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>rguigo@imim.es</email>
				</au>
				<au id="A3" ca="yes">
					<snm>Alb&#224;</snm>
					<fnm>M Mar</fnm>
					<insr iid="I1"/>
					<email>malba@imim.es</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Research Unit on Biomedical Informatics, Institut Municipal d'Investigaci&#243; M&#232;dica, Universitat Pompeu Fabra, Barcelona 08003, Spain</p>
				</ins>
				<ins id="I2">
					<p>Centre de Regulaci&#243; Gen&#242;mica, Barcelona 08003, Spain</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2006</pubdate>
			<volume>7</volume>
			<issue>4</issue>
			<fpage>R33</fpage>
			<url>http://genomebiology.com/2006/7/4/R33</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16640792</pubid><pubid idtype="doi">10.1186/gb-2006-7-4-r33</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>3</day>
					<month>2</month>
					<year>2006</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>17</day>
					<month>3</month>
					<year>2006</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>23</day>
					<month>3</month>
					<year>2006</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>26</day>
					<month>4</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Mularoni et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<shorttitle>
			<p>Human repeat mutations</p>
		</shorttitle>
		<shortabs>
			<p>A genome-wide report on the types of mutations occurring in amino acid repeats of human proteins shows that the mutational dynamics of different types of repeats are very diverse.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Single amino acid tandem repeats, also called homopolymeric amino acid tracts, are very abundant in eukaryotic proteins and are present in nearly one-fifth of human gene products <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. They can be encoded by runs of a single codon or by a mixture of synonymous codons. Pure runs of the same codon will be susceptible to expansions and contractions of the core repetitive unit via slippage of trinucleotide repeat units <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. In accordance, repeats that are poorly conserved in orthologous genes across different species are more often encoded by homogeneous codon tracts than repeats that are well conserved across species <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B5">5</abbr></abbrgrp>.</p>
			<p>It has been proposed that the high mutability associated with slippage may provide an evolutionary advantage in the adaptation to new environments and to the rapid evolution of morphological traits <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. But slippage can also have pathogenic effects: the uncontrolled expansion of trinucleotide repeats within human coding sequences is associated with several neurodegenerative disorders. Examples are Huntington's disease and dentatorubro-pallidolusyan atrophy, both associated with abnormally long expansions of CAG runs encoding poly-glutamine tracts (for reviews, see <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>). The high mutability of disease-associated repeats is reflected in high repeat size polymorphism levels in the human population <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Detection of highly variable amino acid tandem repeats can thus help discover new loci that may be particularly prone to suffer repeat expansions and become pathogenic.</p>
			<p>Here we report on the mutations found in regions encoding amino acid tandem repeats in human genes using the human expressed sequence tag (EST) database. Of 115 different variants, each supported by at least 2 ESTs, almost half contain expansions or contractions of the amino acid repeat. We analyze the properties of repeats formed by different types of amino acids and identify a group of human genes that could potentially suffer expansions similar to those observed in the disease genes.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Survey of polymorphic amino acid repeats</p>
				</st>
				<p>We analyzed 33,860 human peptide sequences from the Ensembl database <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> for the presence of tandem amino acid repeats of size 5 or longer; 5,467 proteins contained at least one such tandem repeat (about 16%). The most common amino acid repeat types (<it>n</it> &gt; 200) were glutamic acid (888), proline (883), alanine (681), serine (623), glycine (510), leucine (392), glutamine (273) and lysine (223). The average repeat size was similar for different amino acids (in the range 5.8 to 6.8) except for glutamine, with longer repeats (average 8.7).</p>
				<p>We mapped the human ESTs <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> to the repeat regions using TBLASTN <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. We selected those repeat regions, including the tandem repeat and 15 nucleotides of adjacent sequence at each side, that were covered by at least 4 different ESTs with &gt;90% identity matches. These comprised 2,227 repeat regions, about 41% of the initial ones, with an average repeat coverage of 27.4 ESTs per repeat. Within these, 115 (5.2%) showed one or several polymorphic variants, each supported by at least 2 different ESTs (Table <tblr tid="T1">1</tblr>). The amount of polymorphism varied between different amino acids, from 2.4% in leucine repeats to 10.2% in glutamine repeats. Considering only cases for which we had 100 or more ESTs, repeats that were polymorphic went up from 5.2% to 21% (26 out of 123).</p>
				<tbl id="T1" hint_layout="double">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Human amino acid repeat variants</p>
					</caption>
					<tblbdy cols="7">
						<r>
							<c ca="left">
								<p>Repeat type</p>
							</c>
							<c ca="center">
								<p>Number of repeats with EST coverage*</p>
							</c>
							<c ca="center">
								<p>Average codon homogeneity</p>
							</c>
							<c ca="center">
								<p>Average number of ESTs</p>
							</c>
							<c ca="center">
								<p>Polymorphic repeats (%)</p>
							</c>
							<c ca="center">
								<p>Polymorphic up-down (%)<sup>&#8224;</sup></p>
							</c>
							<c ca="center">
								<p>Gap/total repeat variants (%)<sup>&#8225;</sup></p>
							</c>
						</r>
						<r>
							<c cspan="7">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>All</p>
							</c>
							<c ca="center">
								<p>2,227</p>
							</c>
							<c ca="center">
								<p>0.49</p>
							</c>
							<c ca="center">
								<p>27.4</p>
							</c>
							<c ca="center">
								<p>115 (5.2)</p>
							</c>
							<c ca="center">
								<p>110-106 (4.8)</p>
							</c>
							<c ca="center">
								<p>60/137 (44%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>A</p>
							</c>
							<c ca="center">
								<p>249</p>
							</c>
							<c ca="center">
								<p>0.37</p>
							</c>
							<c ca="center">
								<p>35.5</p>
							</c>
							<c ca="center">
								<p>14 (5.6)</p>
							</c>
							<c ca="center">
								<p>16-9 (5)</p>
							</c>
							<c ca="center">
								<p>8/17 (47%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>E</p>
							</c>
							<c ca="center">
								<p>487</p>
							</c>
							<c ca="center">
								<p>0.55</p>
							</c>
							<c ca="center">
								<p>28.8</p>
							</c>
							<c ca="center">
								<p>31 (6.4)</p>
							</c>
							<c ca="center">
								<p>20-20 (4.1)</p>
							</c>
							<c ca="center">
								<p>15/35 (43%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>G</p>
							</c>
							<c ca="center">
								<p>193</p>
							</c>
							<c ca="center">
								<p>0.48</p>
							</c>
							<c ca="center">
								<p>27.7</p>
							</c>
							<c ca="center">
								<p>12 (6.2)</p>
							</c>
							<c ca="center">
								<p>13-12 (6.5)</p>
							</c>
							<c ca="center">
								<p>4/15 (26%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>L</p>
							</c>
							<c ca="center">
								<p>210</p>
							</c>
							<c ca="center">
								<p>0.55</p>
							</c>
							<c ca="center">
								<p>26.8</p>
							</c>
							<c ca="center">
								<p>5 (2.4)</p>
							</c>
							<c ca="center">
								<p>11-13 (5.7)</p>
							</c>
							<c ca="center">
								<p>4/5 (80%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>P</p>
							</c>
							<c ca="center">
								<p>312</p>
							</c>
							<c ca="center">
								<p>0.41</p>
							</c>
							<c ca="center">
								<p>26.8</p>
							</c>
							<c ca="center">
								<p>17 (5.4)</p>
							</c>
							<c ca="center">
								<p>17-17 (5.4)</p>
							</c>
							<c ca="center">
								<p>3/22 (14%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>S</p>
							</c>
							<c ca="center">
								<p>315</p>
							</c>
							<c ca="center">
								<p>0.41</p>
							</c>
							<c ca="center">
								<p>22.7</p>
							</c>
							<c ca="center">
								<p>9 (2.9)</p>
							</c>
							<c ca="center">
								<p>10-8 (2.8)</p>
							</c>
							<c ca="center">
								<p>2/11 (18%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>K</p>
							</c>
							<c ca="center">
								<p>134</p>
							</c>
							<c ca="center">
								<p>0.5</p>
							</c>
							<c ca="center">
								<p>36.9</p>
							</c>
							<c ca="center">
								<p>7 (5.2)</p>
							</c>
							<c ca="center">
								<p>6-10(5.9)</p>
							</c>
							<c ca="center">
								<p>3/7 (43%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>137</p>
							</c>
							<c ca="center">
								<p>0.66</p>
							</c>
							<c ca="center">
								<p>19.7</p>
							</c>
							<c ca="center">
								<p>14 (10.2)</p>
							</c>
							<c ca="center">
								<p>8-9 (6.2)</p>
							</c>
							<c ca="center">
								<p>15/17 (88%)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Number of repeats covered by at least four ESTs. <sup>&#8224;</sup>Number of polymorphic sequences immediately upstream (up) and downstream (down) of repeats; the percentages in parentheses were calculated by taking them together. <sup>&#8225;</sup>Number of repeat polymorphic variants involving gaps with respect to the total number of variants.</p>
					</tblfn>
				</tbl>
				<p>We detected 137 different polymorphic variants in the 115 polymorphic amino acid repeats. We classified them as those containing gaps (or indels) of which there were 60 (43.8%), and those containing only amino acid substitutions, of which there were 77 (56.2%) (Additional data file 1) We also measured the repeat codon homogeneity as the fraction of the repeat encoded by a perfect codon run. A high average codon homogeneity in the sequences encoding different types of amino acid repeats was generally associated with a high percentage of gap polymorphic variants (Table <tblr tid="T1">1</tblr>). Glutamine repeats showed the highest frequency of gap polymorphisms (88% of the glutamine polymorphic variants) and the highest average codon homogeneity (0.66), while proline and serine repeats showed the lowest gap polymorphism frequencies (14% and 18%, respectively) and the lowest average codon homogeneity (0.41 in both cases).</p>
				<p>We also analyzed polymorphisms supported by &#8805;4 ESTs, which should be enriched in more common polymorphic variants; they comprised 38% of the polymorphism data. In this dataset, the frequency of repeat gap polymorphisms was higher than that of substitutions (35 versus 17).</p>
			</sec>
			<sec>
				<st>
					<p>Analysis of repeat adjacent sequences</p>
				</st>
				<p>We compared the repeat polymorphism levels to those of the sequences immediately adjacent to the repeats, considering, at each side of the repeat, a sequence of the same length as the corresponding repeat (Additional data file 1). The overall number of polymorphic variants was similar to that found within the repeats (4.8% versus 5.2%), but the number of polymorphisms containing gaps was remarkably lower, 8 in upstream regions and 14 in downstream regions, about 5 times less than within repeats (60 cases). In contrast, substitutions were slightly more common outside the repeats than inside them: 103 and 93 in upstream and downstream regions, respectively, compared to 77 within repeats.</p>
				<p>Among polymorphisms supported by &#8805;4 ESTs, the trend was maintained for a larger number of gap polymorphisms within repeats than outside repeats (35 in respect to 3 in upstream and 11 in downstream sequences) and only a small difference for substitutions (21 in upstream and 26 in downstream sequences, in comparison to 17 within repeats).</p>
			</sec>
			<sec>
				<st>
					<p>Types of polymorphism by amino acid repeat type</p>
				</st>
				<p>We compared the number of polymorphisms involving gaps or substitutions in different amino acid repeat types and adjacent regions (Figure <figr fid="F1">1</figr>). This analysis showed that the previously observed larger number of substitutions outside the repeats with respect to inside the repeats could be mainly attributed to leucine repeats (10 and 12 polymorphic variants in upstream and downstream sequences, respectively, versus only one within the repeat). On the other hand, glutamine, alanine and glutamic acid repeats were the main contributors to the increased number of gap polymorphisms inside the repeats than outside the repeats. The ratio between gap polymorphisms and substitution polymorphisms was highest in the case of glutamine (15 versus 2) and lowest for proline (3 versus 22), indicating strong differences in the susceptibility to slippage of different repeat types.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Number of polymorphic variants for regions containing different kinds of amino acid repeats</p>
					</caption>
					<text>
						<p>Number of polymorphic variants for regions containing different kinds of amino acid repeats. For the upstream and downstream sequences adjacent to the repeat the average value was taken. Bars indicate the actual values of both repeat adjacent sides.</p>
					</text>
					<graphic file="gb-2006-7-4-r33-1" hint_layout="double"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Position and nature of amino acid substitutions</p>
				</st>
				<p>We investigated the frequency of the different amino acid substitutions in repeats and adjacent sequences, focusing on the eight most common amino acids forming tandem repeats (Table <tblr tid="T2">2</tblr>). The aim was to identify possible biases in the amino acid substitution patterns inside repeats with respect to the repeats' adjacent sequences, as this could be informative of specific selective constraints operating in the repetitive structures. The dataset analyzed comprised 79 substitutions inside repeats and 135 in adjacent regions. In the first place, we determined that the vast majority of amino acid substitutions could be explained by single non-synonymous nucleotide changes. Inspection of the types of amino acid substitutions in repeats and adjacent sequences indicated that there were no major differences between them. For example, nearly all amino acid substitutions that occurred at least five times in the adjacent sequences, representing the most common amino acid replacements, could also be observed inside the repeats. The only exception was the replacement of G by V, with seven cases in adjacent sequences versus none within repeats. Given the low number of cases, however, this observation should be treated with caution.</p>
				<tbl id="T2" hint_layout="double">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Amino acid substitutions in polymorphic variants</p>
					</caption>
					<tblbdy cols="22">
						<r>
							<c ca="left">
								<p>From/to</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>G</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="center">
								<p>P</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>V</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="center">
								<p>M</p>
							</c>
							<c ca="center">
								<p>F</p>
							</c>
							<c ca="center">
								<p>W</p>
							</c>
							<c ca="center">
								<p>T</p>
							</c>
							<c ca="center">
								<p>C</p>
							</c>
							<c ca="center">
								<p>Y</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>D</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="center">
								<p>H</p>
							</c>
							<c ca="center">
								<p>
									<b>Total</b>
								</p>
							</c>
						</r>
						<r>
							<c cspan="22">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Substitutions within repeats</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>A</b>
								</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>9</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>E</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>21</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>G</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>13</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>L</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>1</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>P</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>
									<b>20</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>S</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>9</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>K</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>4</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>
									<b>2</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Total</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>9</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>2</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>8</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>1</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>3</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>8</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>8</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>6</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>6</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>0</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>0</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>3</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>0</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>9</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>2</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>0</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>1</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>5</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>6</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>2</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>79</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Rel. frequency</p>
							</c>
							<c ca="center">
								<p>0.11</p>
							</c>
							<c ca="center">
								<p>0.03</p>
							</c>
							<c ca="center">
								<p>0.10</p>
							</c>
							<c ca="center">
								<p>0.01</p>
							</c>
							<c ca="center">
								<p>0.04</p>
							</c>
							<c ca="center">
								<p>0.10</p>
							</c>
							<c ca="center">
								<p>0.10</p>
							</c>
							<c ca="center">
								<p>0.08</p>
							</c>
							<c ca="center">
								<p>0.08</p>
							</c>
							<c ca="center">
								<p>0.00</p>
							</c>
							<c ca="center">
								<p>0.00</p>
							</c>
							<c ca="center">
								<p>0.04</p>
							</c>
							<c ca="center">
								<p>0.00</p>
							</c>
							<c ca="center">
								<p>0.11</p>
							</c>
							<c ca="center">
								<p>0.03</p>
							</c>
							<c ca="center">
								<p>0.00</p>
							</c>
							<c ca="center">
								<p>0.01</p>
							</c>
							<c ca="center">
								<p>0.06</p>
							</c>
							<c ca="center">
								<p>0.08</p>
							</c>
							<c ca="center">
								<p>0.03</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>Substitutions in repeat adjacent sequences</b>*</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>A</b>
								</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>21</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>E</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>13</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>G</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>26</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>L</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>15</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>P</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>
									<b>24</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>S</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>13</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>K</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>
									<b>11</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>
									<b>12</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Total</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>14</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>6</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>9</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>7</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>7</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>13</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>7</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>7</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>17</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>2</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>4</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>6</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>2</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>6</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>4</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>0</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>3</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>7</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>10</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>4</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>135</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Rel. frequency</p>
							</c>
							<c ca="center">
								<p>0.10</p>
							</c>
							<c ca="center">
								<p>0.04</p>
							</c>
							<c ca="center">
								<p>0.07</p>
							</c>
							<c ca="center">
								<p>0.05</p>
							</c>
							<c ca="center">
								<p>0.05</p>
							</c>
							<c ca="center">
								<p>0.10</p>
							</c>
							<c ca="center">
								<p>0.05</p>
							</c>
							<c ca="center">
								<p>0.05</p>
							</c>
							<c ca="center">
								<p>0.13</p>
							</c>
							<c ca="center">
								<p>0.01</p>
							</c>
							<c ca="center">
								<p>0.03</p>
							</c>
							<c ca="center">
								<p>0.04</p>
							</c>
							<c ca="center">
								<p>0.01</p>
							</c>
							<c ca="center">
								<p>0.04</p>
							</c>
							<c ca="center">
								<p>0.03</p>
							</c>
							<c ca="center">
								<p>0.00</p>
							</c>
							<c ca="center">
								<p>0.02</p>
							</c>
							<c ca="center">
								<p>0.05</p>
							</c>
							<c ca="center">
								<p>0.07</p>
							</c>
							<c ca="center">
								<p>0.03</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Upstream and downstream sequences taken together. Rel. frequency, relative frequency.</p>
					</tblfn>
				</tbl>
				<p>In addition, we inspected the relative position of substitutions inside the repeats. This could be informative for biases in the positions where substitutions more often occurred. For example, an excess of substitutions at the repeat extremes could indicate a selective pressure to preserve a minimum length of the repeat. However, the observed position of amino acid substitutions was overall not significantly different from the expected distribution if substitutions were located at random (see Materials and methods and Additional data file 1). In conclusion, this analysis did not detect any specific differences in the selective constraints related to different amino acid substitutions inside or outside repeats, or in the relative position of the substitutions within the repeats.</p>
			</sec>
			<sec>
				<st>
					<p>Non-synonymous versus synonymous substitutions</p>
				</st>
				<p>Another aspect we studied was the relative frequency of synonymous and non-synonymous substitutions inside repeats and in repeat adjacent regions. We identified all the synonymous and non-synonymous nucleotide changes in the EST dataset and divided it by the total number of synonymous and non-synonymous positions analyzed (Figure <figr fid="F2">2</figr> and Additional data file 1). In the two types of regions, the frequency of non-synonymous substitutions was lower than that of synonymous substitutions, as expected if some of the substitutions resulting in amino acid changes were negatively selected. The frequency of synonymous substitutions was very similar inside and outside the repeats: 0.015 (1.5% of sites) inside repeats and 0.014 to 0.016 (1.4% to 1.6% of sites) in repeat upstream and downstream regions, respectively. In agreement with the results obtained on amino acid substitutions, the frequency of non-synonymous substitutions was similar inside repeats (0.009, 0.9% of sites) and outside repeats (0.011; 1.1% of sites, in both repeat upstream and downstream regions). By amino acid type, only proline and glutamine repeats showed a non-synonymous substitution pattern different from their corresponding adjacent sequences. In the case of proline, the frequency of non-synonymous substitutions was 0.02 while the average of the two adjacent regions was 0.012. That is, there appeared to be an almost two-fold excess of non-synonymous changes within repeats. In the case of glutamine repeats, the opposite trend was observed, with a non-synonymous substitution frequency of 0.005 inside the repeats versus 0.011 in the adjacent regions.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Frequency of synonymous and non-synonymous nucleotide substitutions for regions containing different kinds of amino acid repeats</p>
					</caption>
					<text>
						<p>Frequency of synonymous and non-synonymous nucleotide substitutions for regions containing different kinds of amino acid repeats. For the upstream and downstream sequences adjacent to the repeat the average value was taken. Bars indicate the actual values of both repeat adjacent sides.</p>
					</text>
					<graphic file="gb-2006-7-4-r33-2" hint_layout="double"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Relationship between polymorphism and codon homogeneity</p>
				</st>
				<p>We next compared the codon homogeneity values of all repeats to those of the repeats associated with gap polymorphisms or with substitution polymorphisms (Figure <figr fid="F3">3</figr>). The average value was 0.49 in all the repeats analyzed, 0.44 for those with substitution variants and 0.65 for those with gap variants. Repeats that showed gap polymorphisms had higher codon homogeneity values than average (<it>p</it> = 0.001, Kolmogorov-Smirnov test). Those with substitution variants, instead, were similar to the general repeat population. These results are expected if we consider that slippage will mainly act on long pure codon tracts, resulting in expansions and contractions of the repeats. Interestingly, however, the presence of a long pure codon tract does not appear to be indispensable for this type of polymorphism to occur, as in about 25% of the cases the longest pure codon run had a short size, between 1 and 3 codon repeat units.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Codon homogeneity distribution of the sequence regions encoding different types of repeats: polymorphic with substitutions, polymorphic with expansions or contractions (gaps), all repeats</p>
					</caption>
					<text>
						<p>Codon homogeneity distribution of the sequence regions encoding different types of repeats: polymorphic with substitutions, polymorphic with expansions or contractions (gaps), all repeats. Codon homogeneity value intervals labeled as X-Y stand for values &gt;X and &lt;=Y (for example, 0-0.2 are values &gt;0 and &lt;= 0.2).</p>
					</text>
					<graphic file="gb-2006-7-4-r33-3" hint_layout="single"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Repeat expansion/contraction polymorphisms</p>
				</st>
				<p>Polymorphic cases related to the expansion or contraction of repeats, those that involve gaps, are of particular interest because of the potential of these elements to cause disease. Table <tblr tid="T3">3</tblr> lists genes containing this type of polymorphism for the most abundant amino acid repeat types. Among them we detected two poly-glutamine containing genes known to be associated with neurodegenerative disorders: dentatorubral-pallidoluysian atrophy protein (DRPLA) and spinocerebellar ataxia protein 6 (voltage-dependent P/Q-type calcium channel alpha-1A subunit, or CACNA1A). These two disease loci contained long runs (19 and 13 glutamines, respectively) and high codon homogeneity levels (0.79 and 1, respectively). Other genes in the list with long homopeptide runs and high codon homogeneity are thus possible candidates to be associated with disease. Among genes showing expansion/contraction polymorphisms was an abundance of transcription factors and RNA-binding proteins. Most of the polymorphic variants were one repeat unit away from the reference repeat, the maximum difference being three repeat units. The detection of longer repeat size variants would be hindered by our &gt;90% identity EST match criteria, but, given that only two variants were found that show a 3 repeat unit size difference, these cases are expected to be rare. In 12 cases, the longest pure codon run occupied the totality of the repeat (codon homogeneity of 1). Length polymorphisms were most frequently associated with CAG (glutamine), GAG (glutamic acid) and CTG/GCT (leucine/alanine).</p>
				<tbl id="T3" hint_layout="double">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Repeat gap polymorphic variants</p>
					</caption>
					<tblbdy cols="12">
						<r>
							<c ca="left">
								<p>Ensembl ID</p>
							</c>
							<c ca="left">
								<p>Locus link ID</p>
							</c>
							<c ca="center">
								<p>AA</p>
							</c>
							<c ca="center">
								<p>Position*</p>
							</c>
							<c ca="center">
								<p>Size*</p>
							</c>
							<c ca="center">
								<p>Size variant</p>
							</c>
							<c ca="center">
								<p>Len. protein*</p>
							</c>
							<c ca="center">
								<p>Number of ESTs<sup>&#8224;</sup></p>
							</c>
							<c ca="center">
								<p>Codon max run<sup>&#8225;</sup></p>
							</c>
							<c ca="center">
								<p>Codon hom.<sup>&#167;</sup></p>
							</c>
							<c ca="center">
								<p>Max run size<sup>&#8225;</sup></p>
							</c>
							<c ca="left">
								<p>Description</p>
							</c>
						</r>
						<r>
							<c cspan="12">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000282388</p>
							</c>
							<c ca="left">
								<p>ZFP36L2</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>394</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>494</p>
							</c>
							<c ca="center">
								<p>195</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Butyrate response factor 2 (TIS11D protein)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000324790</p>
							</c>
							<c ca="left">
								<p>TDE2L</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>363</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>455</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>Tumor differentially expressed 2-like</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000317661</p>
							</c>
							<c ca="left">
								<p>CACNA1A</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>2,311</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>2,505</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="left">
								<p>Voltage-dependent P/Q-type calcium channel alpha-1A subunit (CACNA1A)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000280665</p>
							</c>
							<c ca="left">
								<p>DCP1B</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>251</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>617</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.90</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="left">
								<p>mRNA decapping enzyme 1B</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000348018</p>
							</c>
							<c ca="left">
								<p>ZNF384</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>439</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>516</p>
							</c>
							<c ca="center">
								<p>23</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.88</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="left">
								<p>Zinc finger protein 384 (nuclear matrix transcription factor 4)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000264883</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>92</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>507</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.80</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="left">
								<p>Nucleoporin p54 (54 kDa nucleoporin)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000229279</p>
							</c>
							<c ca="left">
								<p>ATN1</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>482</p>
							</c>
							<c ca="center">
								<p>19</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>1,189</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.79</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="left">
								<p>Atrophin-1 (dentatorubral-pallidoluysian atrophy protein; DRPLA)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000265773</p>
							</c>
							<c ca="left">
								<p>SMARCA2</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>215</p>
							</c>
							<c ca="center">
								<p>23</p>
							</c>
							<c ca="center">
								<p>22</p>
							</c>
							<c ca="center">
								<p>1,590</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.57</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="left">
								<p>Possible global transcription activator SNF2L2 (SNF2-alpha)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000354597</p>
							</c>
							<c ca="left">
								<p>KIAA0476</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>815</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>1,417</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.56</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="left">
								<p>Unknown function</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000272804</p>
							</c>
							<c ca="left">
								<p>KIAA1946</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>42</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>15,16</p>
							</c>
							<c ca="center">
								<p>428</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="left">
								<p>KIAA1946</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000313603</p>
							</c>
							<c ca="left">
								<p>ABCF1</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>63</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>9,11</p>
							</c>
							<c ca="center">
								<p>845</p>
							</c>
							<c ca="center">
								<p>20</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.40</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="left">
								<p>ATP-binding cassette. sub-family F, member 1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000252891</p>
							</c>
							<c ca="left">
								<p>NUMBL</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>426</p>
							</c>
							<c ca="center">
								<p>20</p>
							</c>
							<c ca="center">
								<p>18</p>
							</c>
							<c ca="center">
								<p>609</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.35</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Numb-like protein (Numb-R)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000304689</p>
							</c>
							<c ca="left">
								<p>THAP11</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>103</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="center">
								<p>28</p>
							</c>
							<c ca="center">
								<p>314</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.34</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="left">
								<p>THAP domain protein 11 (HRIHFB2206)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000345671</p>
							</c>
							<c ca="left">
								<p>NCOA3</p>
							</c>
							<c ca="center">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>1,243</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="center">
								<p>28</p>
							</c>
							<c ca="center">
								<p>1,420</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>CAG</p>
							</c>
							<c ca="center">
								<p>0.31</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="left">
								<p>Nuclear receptor coactivator 3 isoform b</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000301187</p>
							</c>
							<c ca="left">
								<p>TMC4</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>706</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>Transmembrane channel-like 4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000315064</p>
							</c>
							<c ca="left">
								<p>MAGEF1</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>152</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>4,7</p>
							</c>
							<c ca="center">
								<p>307</p>
							</c>
							<c ca="center">
								<p>49</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="left">
								<p>Melanoma-associated antigen F1 (MAGE-F1 antigen)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000340702</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>630</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>9,11</p>
							</c>
							<c ca="center">
								<p>686</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="left">
								<p>106 kDa O-GlcNAc transferase-interacting protein</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000262680</p>
							</c>
							<c ca="left">
								<p>NRD1</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>149</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>1,219</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
							<c ca="center">
								<p>GAA</p>
							</c>
							<c ca="center">
								<p>0.80</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="left">
								<p>Nardilysin precursor (EC 342461) (N-arginine dibasic convertase)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000252455</p>
							</c>
							<c ca="left">
								<p>PRKCSH</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>312</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>528</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>0.77</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="left">
								<p>Glucosidase II beta subunit precursor (PKCSH)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000253237</p>
							</c>
							<c ca="left">
								<p>GRWD1</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>123</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>446</p>
							</c>
							<c ca="center">
								<p>79</p>
							</c>
							<c ca="center">
								<p>GAA</p>
							</c>
							<c ca="center">
								<p>0.50</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="left">
								<p>Glutamate-rich WD-repeat protein 1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000262710</p>
							</c>
							<c ca="left">
								<p>ACIN1</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>269</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>1,341</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>0.50</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="left">
								<p>Apoptotic chromatin condensation inducer in the nucleus (Acinus)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000346324</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>60</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>109</p>
							</c>
							<c ca="center">
								<p>249</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="left">
								<p>Predicted: similar to prothymosin alpha</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000263274</p>
							</c>
							<c ca="left">
								<p>LIG1</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>152</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>919</p>
							</c>
							<c ca="center">
								<p>19</p>
							</c>
							<c ca="center">
								<p>GAG/GAA</p>
							</c>
							<c ca="center">
								<p>0.33</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="left">
								<p>DNA ligase I (polydeoxyribonucleotide synthase [ATP])</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000304498</p>
							</c>
							<c ca="left">
								<p>PODXL2</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>161</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>529</p>
							</c>
							<c ca="center">
								<p>39</p>
							</c>
							<c ca="center">
								<p>GAG</p>
							</c>
							<c ca="center">
								<p>0.27</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="left">
								<p>Endoglycan</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000345444</p>
							</c>
							<c ca="left">
								<p>APLP2</p>
							</c>
							<c ca="center">
								<p>E</p>
							</c>
							<c ca="center">
								<p>220</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>707</p>
							</c>
							<c ca="center">
								<p>84</p>
							</c>
							<c ca="center">
								<p>GAG/GAA</p>
							</c>
							<c ca="center">
								<p>0.14</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Amyloid-like protein 2 precursor (CDEI-box binding protein)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000350479</p>
							</c>
							<c ca="left">
								<p>RPL14</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>149</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>11,12</p>
							</c>
							<c ca="center">
								<p>215</p>
							</c>
							<c ca="center">
								<p>213</p>
							</c>
							<c ca="center">
								<p>GCT</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="left">
								<p>60S ribosomal protein L14 (CAG-ISL 7)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000255608</p>
							</c>
							<c ca="left">
								<p>BTBD2</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>40</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>15,16</p>
							</c>
							<c ca="center">
								<p>525</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>GCC</p>
							</c>
							<c ca="center">
								<p>0.93</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="left">
								<p>BTB/POZ domain containing protein 2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000305783</p>
							</c>
							<c ca="left">
								<p>RBM23</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>368</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>423</p>
							</c>
							<c ca="center">
								<p>53</p>
							</c>
							<c ca="center">
								<p>GCT</p>
							</c>
							<c ca="center">
								<p>0.56</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>RNA-binding region containing protein 4 (pplicing factor SF2)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000346678</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>130</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>232</p>
							</c>
							<c ca="center">
								<p>50</p>
							</c>
							<c ca="center">
								<p>GCA</p>
							</c>
							<c ca="center">
								<p>0.33</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="left">
								<p>Similar to splicing factor. arginine/serine-rich 4 isoform c</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000330188</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>266</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>434</p>
							</c>
							<c ca="center">
								<p>50</p>
							</c>
							<c ca="center">
								<p>GCA/GCT</p>
							</c>
							<c ca="center">
								<p>0.20</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Similar to splicing factor. arginine/serine-rich 4 isoform c</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000324573</p>
							</c>
							<c ca="left">
								<p>FLII</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="center">
								<p>410</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>1,269</p>
							</c>
							<c ca="center">
								<p>25</p>
							</c>
							<c ca="center">
								<p>GCA/GCT</p>
							</c>
							<c ca="center">
								<p>0.17</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Flightless-I protein homolog</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000255631</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>G</p>
							</c>
							<c ca="center">
								<p>24</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>359</p>
							</c>
							<c ca="center">
								<p>96</p>
							</c>
							<c ca="center">
								<p>GGC</p>
							</c>
							<c ca="center">
								<p>0.83</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>hsp70-interacting protein</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000246533</p>
							</c>
							<c ca="left">
								<p>CAPNS1</p>
							</c>
							<c ca="center">
								<p>G</p>
							</c>
							<c ca="center">
								<p>36</p>
							</c>
							<c ca="center">
								<p>20</p>
							</c>
							<c ca="center">
								<p>21</p>
							</c>
							<c ca="center">
								<p>268</p>
							</c>
							<c ca="center">
								<p>100</p>
							</c>
							<c ca="center">
								<p>GGC</p>
							</c>
							<c ca="center">
								<p>0.50</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="left">
								<p>Calpain small subunit 1 (CSS1)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000218072</p>
							</c>
							<c ca="left">
								<p>SRPX</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>464</p>
							</c>
							<c ca="center">
								<p>21</p>
							</c>
							<c ca="center">
								<p>CTG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Sushi repeat-containing protein SRPX precursor</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000315602</p>
							</c>
							<c ca="left">
								<p>CHRNA3</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>505</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>CTG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Neuronal acetylcholine receptor protein, alpha-3 chain precursor</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000344134</p>
							</c>
							<c ca="left">
								<p>MOG</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>206</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>CTC</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="left">
								<p>Myelin-oligodendrocyte glycoprotein precursor</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000240617</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="center">
								<p>17</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>553</p>
							</c>
							<c ca="center">
								<p>22</p>
							</c>
							<c ca="center">
								<p>CTG</p>
							</c>
							<c ca="center">
								<p>0.88</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="left">
								<p>Unknown function</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000304072</p>
							</c>
							<c ca="left">
								<p>DDX54</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="center">
								<p>89</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>882</p>
							</c>
							<c ca="center">
								<p>97</p>
							</c>
							<c ca="center">
								<p>AAG</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>DEAD-box protein 54</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000285814</p>
							</c>
							<c ca="left">
								<p>MKI67IP</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="center">
								<p>211</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>293</p>
							</c>
							<c ca="center">
								<p>79</p>
							</c>
							<c ca="center">
								<p>AAG</p>
							</c>
							<c ca="center">
								<p>0.60</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="left">
								<p>MKI67 (FHA domain) interacting nucleolar phosphoprotein</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000276212</p>
							</c>
							<c ca="left">
								<p>GPC3</p>
							</c>
							<c ca="center">
								<p>P</p>
							</c>
							<c ca="center">
								<p>25</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>580</p>
							</c>
							<c ca="center">
								<p>54</p>
							</c>
							<c ca="center">
								<p>CCG</p>
							</c>
							<c ca="center">
								<p>0.83</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>Glypican-3 precursor (Intestinal protein OCI-5)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000312296</p>
							</c>
							<c ca="left">
								<p>CKAP4</p>
							</c>
							<c ca="center">
								<p>P</p>
							</c>
							<c ca="center">
								<p>42</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>602</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>CCG</p>
							</c>
							<c ca="center">
								<p>0.80</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="left">
								<p>Cytoskeleton-associated protein 4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000286910</p>
							</c>
							<c ca="left">
								<p>PCGF6</p>
							</c>
							<c ca="center">
								<p>P</p>
							</c>
							<c ca="center">
								<p>23</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>350</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>CCT</p>
							</c>
							<c ca="center">
								<p>0.40</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="left">
								<p>Polycomb group ring finger 6 isoform a</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000301653</p>
							</c>
							<c ca="left">
								<p>KRT16</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="center">
								<p>72</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>473</p>
							</c>
							<c ca="center">
								<p>248</p>
							</c>
							<c ca="center">
								<p>AGC</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="left">
								<p>Keratin, type I cytoskeletal 16 (cytokeratin 16)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ENSP00000307804</p>
							</c>
							<c ca="left">
								<p>MLLT3</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="center">
								<p>382</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>568</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>AGC/TCC</p>
							</c>
							<c ca="center">
								<p>0.11</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>AF-9 protein</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Refers to the Ensembl protein. Len., length. Size, size of repeat. <sup>&#8224;</sup>Number of ESTs covering the repeat. <sup>&#8225;</sup>Max run, longest pure codon run within the repeat-encoding sequence. <sup>&#167;</sup>Codon hom. (homogeneity), size of Max run divided by size of the repeat. AA, amino acid. Size variant can include several size variants (for example, 15,16)</p>
					</tblfn>
				</tbl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>Databases of ESTs can be used to rapidly screen for potential polymorphisms in the products of eukaryotic genomes <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp> and in particular are of great use for identifying microsatellite size variants <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. We have explored this type of resource to obtain an overview of the polymorphisms associated with amino acid tandem repeats in human proteins, including potential expansion/contraction polymorphisms that may be associated with disease. We have focused on variants supported by at least two different EST sequences, discarding those associated with a single EST, to minimize the effect of possible errors introduced by the EST sequencing procedure. Other studies have focused on the detection of polymorphisms in highly homogeneous DNA repeats in coding sequences <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, or in specific amino acid repeat datasets <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B22">22</abbr></abbrgrp>. While our results are generally consistent with these studies, we have taken a more extensive genome-wide approach, using all human sequence information currently available in databases to obtain a more complete picture of the types of mutations found in different amino acid repeat types. In this regard, the study by O'Dushlaine <it>et al</it>. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> has points in common with our study, since ESTs were also used to infer patterns of copy number variation in protein coding genes in the human genome. In the former, however, repeat length polymorphism was investigated at the nucleotide level, whereas we investigate it at amino acid level. For this reason, while in <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> the analysis is based on UniGene clusters, and a representative sequence from each cluster is compared to all ESTs in the same cluster, we have based our analysis on the Ensembl set of proteins, and compared each of them against the entire set of ESTs. The two studies are complementary and a fraction of about 30% of the polymorphic variants identified in our study maps to polymorphic variants found in <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
			<p>It has been suggested that the evolutionary dynamics of microsatellite-type structures can be explained by a balance between expansion by slippage and growth interruption by point mutation <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The different frequencies of gap and substitution mutations that can be observed in different types of repeats are, therefore, likely to reflect the different strength of these two evolutionary forces at the DNA level, coupled with the action of selection at the protein level. Many of the gap variants may have originated by trinucleotide slippage, as they show significantly higher levels of codon homogeneity, and this has been linked to increased repeat expansions <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and to higher inter-specific repeat divergence <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Unequal recombination has also been suggested to result in large size differences in a number of disease-associated poly-alanine tracts <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, but it seems unlikely that it plays a major contribution here, as the variants we describe mostly diverge by one repeat unit and are biased toward long pure codon runs. Within human amino acid repeats it becomes clear that glutamine has a much higher propensity to suffer expansions than other types of repeats, with 88% of the polymorphic variants containing gaps (15 out of 17). On the contrary, proline repeats appear to be little exposed to this type of mutation, with only 14% of the polymorphic variants containing gaps (3 out of 22). In spite of the low expansion/contraction rate observed for proline repeats, which would seem to suggest a low rate of <it>de novo </it>formation of this kind of repeat, it is interesting to note that these are among the most common repeats. Their abundance may be related to a role in mediating protein-protein interactions, as proline-rich regions are often found in protein-protein interaction surfaces <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and proline tandem repeats are strongly associated with 'protein binding' functional annotations <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
			<p>Our analysis captures the elevated levels of repeat size polymorphism previously reported in poly-glutamine disease-associated loci <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Of the 4 different disease loci for which we have obtained coverage with &#8805;4 ESTs - spinocerebellar ataxia 6 (CACNA1A or SCA6), dentarubro-pallidolusyan atrophy, Huntington's disease and spinocerebellar ataxia 7 (SCA7) - we detected repeat size polymorphic variants for the first two. The lack of observed variability for Huntington's disease and SCA7 may be explained by their poor EST coverage, 5 and 6 ESTs, respectively. Glutamine repeats associated with human disease share a number of characteristics: they are highly polymorphic, they are among the longest tandem amino acid repeats in the proteome, and they are encoded by highly homogeneous codon runs. A fourth characteristic, previously reported, is that they are generally much shorter in rodent species than in humans, probably denoting a recent expansion in primates <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. Interestingly, we have detected several other loci with similar characteristics, which could, therefore, be good candidates for involvement in trinucleotide expansion diseases. For example, the mRNA decapping enzyme 1B contains a poly-glutamine run of 10 units, codon homogeneity of 0.9, and has no detectable repeat in the rodent homologues. Another example is zinc finger protein 384 (nuclear matrix transcription factor 4), which contains a run of 16 repeat units, codon homogeneity of 0.88, and a shorter repeat of size 7 in both mouse and rat.</p>
			<p>We observed that about 5.2% of the repeats (115 out of 2,227) show some kind of polymorphism but as the average EST coverage is only 27.4 ESTs per loci, many polymorphisms occurring in natural populations may have been missed. A closer estimate may be obtained using cases with an EST coverage of 100 or more ESTs per loci (123 repeats with average EST coverage 179.8). In this case, 21% of the repeats (26 out of 123) have at least one polymorphic variant. In a study based on a selection of highly homogeneous DNA repeats in human coding sequences, it was found that out of 42 repeats tested by PCR amplification from 36 individuals about 40% were polymorphic <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. For comparative purposes, let's consider those repeats in our dataset with coverage of &#8805;100 ESTs and encoded by sequences containing pure codon repeats of size &#8805;5 (17 different ones). The polymorphism level within these repeats is 17.3% considering cases supported by &#8805;2 ESTs, and 35.5% considering those supported by &#8805;1 EST; the latter is similar to that obtained in <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. In another study, the authors screened polymorphisms associated with human sequences coding for more than 7 alanines in 42 DNA samples, and determined that 24.5% (24 out of 98) had triplet expansion/contraction polymorphic variants <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Using the same size cutoff, we detected repeat size variants for 40% of poly-alanine repeats (2 out of 5) using &#8805;100 EST coverage (or 3 out of 38 (13%,)using &#8805;4 EST coverage). One of them, in 60S ribosomal protein L14 (RPL14), is common to both datasets.</p>
			<p>The intra-specific variability within repeat structures has been compared to that in adjacent regions. In general, the number of gap polymorphisms in the repeat surrounding regions is five times lower than that within repeats, indicating a much more reduced slippage activity outside the repeats. However, the number of substitutions is, in general, comparable to that within the repeats. Considering that an important fraction of the repeats is likely to comprise neutral structures, many of which might have originated by slippage, it is somewhat surprising to observe a similar relaxed level of negative selection inside and outside the repeats. An exception is leucine, which shows a very small number of substitutions inside the repeat compared to the adjacent region. That would be consistent with the existence of stronger functional constraints inside the repeat. Leucine tandem repeats are often found at the amino terminus of transmembrane receptor proteins <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, where it has been suggested that they could function as signal peptides <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Other proteins, such as Toll-like receptors, contain leucine-rich regions that can be involved in the recognition of pathogens <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. These putative functions could result in a reduced number of observed substitution polymorphisms.</p>
			<p>A deeper insight into the substitution patterns in repeats and adjacent regions can be obtained by the analysis of the types of amino acid substitutions, as well as the frequencies of synonymous and non-synonymous substitutions, in the two types of regions. We have found that the vast majority of amino acid changes can be explained by a single nucleotide change, indicating a low incidence of multiple substitutions at the same site, as expected for intra-specific sequence comparisons. This analysis has also shown that a broad range of different amino acid replacements can be observed in the polymorphic variants of both repeats and adjacent sequences. The effect of selection can be better analyzed by comparing the non-synonymous and the synonymous substitution frequencies, as only the former will be related to selective constraints at the protein sequence level. Our results show that non-synonymous substitution frequencies are lower than synonymous ones, both in repeats and adjacent sequences, indicating that selection plays a role in shaping the amino acid content of these regions. The overall observed ratio is about 1 non-synonymous substitution for every 1.5 synonymous substitutions, which is higher than that typically observed in inter-specific comparisons <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The increased non-synonymous to synonymous substitution ratio in intra-specific measurements versus inter-specific ones is not unexpected in light of several recent reports <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, and one of the reasons could be the persistence in populations of slightly deleterious non-synonymous mutations that are yet to be lost <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. In comparing repeats and adjacent regions, few differences in the non-synonymous versus synonymous substitution frequencies are observed, which, together with the data on amino acid substitutions, indicates that overall the selective constraints related to substitutions, at least at the intra-specific level, do not appear to be too different inside repeats and in the regions adjacent to them.</p>
			<p>An interesting question for future studies will be to determine if similar conclusions can be derived from inter-specific comparisons. Interestingly, it has been previously noted that regions adjacent to poly-glutamine tracts in human and mouse proteins tend to show high divergence rates, particularly when repeats are not conserved between the two species <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. This may indicate that repeats tend to originate in regions that are subjected to low selective constraints, where disruption of the structure or function of the protein will be less severe. Another interesting scenario is that many of the adjacent regions may indeed be old degenerate repeats, which, in the absence of selection, are in a rapidly evolving phase. The expected increase in available sequence and variability data will undoubtedly contribute to deepen our understanding of these highly mutagenic sequences.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>We have identified a large number of human amino acid repeat variants and classified them according to the mutational mechanism, amino acid substitution or expansion/contraction, of the repeat. This has allowed us to quantify the mutation propensity of regions located within and outside tandem repeats and of repeats formed by different amino acid repeat types. The analysis has led to the identification of new candidate disease genes.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Sequence databases</p>
				</st>
				<p>Human protein and cDNA sequences were extracted from the Ensembl database (NCBI35-based release) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The number of initial peptide sequences was 33,860. The source of EST sequences was the NCBI-EST database (Feb 3 2005) at the National Center for Biotechnology Information <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, containing 5,430,499 EST human sequences.</p>
			</sec>
			<sec>
				<st>
					<p>Repeat count and analysis</p>
				</st>
				<p>We used our own programs to identify all single amino acid tandem repeats of size five or longer in the human proteins and to extract the DNA sequences encoding them. We identified repeats in 5,467 different proteins. For each repeat we stored the repeated amino acid, position in the sequence, repeat length, length of the longest pure codon run and codon(s) in the longest pure codon run(s). In specific cases we also retrieved the equivalent repeat in the mouse and rat orthologous sequences using BLASTP <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> at NCBI. For each repeat we calculated codon homogeneity as the fraction of the repeat occupied by the longest pure codon run. The non-parametric Kolmogorov-Smirnov test was used to assess the difference in the codon homogeneity values of different samples.</p>
			</sec>
			<sec>
				<st>
					<p>EST mapping</p>
				</st>
				<p>We mapped all human ESTs to the repeat regions in the reference proteins using the program TBLASTN <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The repeat regions included the perfect tandem repeat and 15 nucleotide sequences at each side of the repeat. We considered only EST matches that covered the entire repeat region and showed a percent identity &gt;90%. This may hinder the detection of very divergent polymorphic variants but limits the chances of matches between unrelated sequences. For analysis we selected those repeat regions that were covered by at least 4 different ESTs (2,227 cases). We also retrieved cases covered by at least 100 different ESTs (123 cases).</p>
			</sec>
			<sec>
				<st>
					<p>Detection of polymorphic variants</p>
				</st>
				<p>Polymorphic variants were identified as changes to the original sequence supported by at least 2 independent ESTs (137 within repeats, 111 in upstream regions, 107 in downstream regions). We counted the different types of polymorphisms within the tandem repeats and in sequences of the same length immediately adjacent to the repeats. We discarded those cases where adjacent regions also contained repeats. For comparison we also analyzed polymorphic variants supported by at least 4 ESTs (52 within repeats, 24 in upstream regions, 37 in downstream regions). They were classified as variants involving expansions and/or contractions (gaps or indels) and variants involving only amino acid substitutions.</p>
			</sec>
			<sec>
				<st>
					<p>Type and location of amino acid substitutions</p>
				</st>
				<p>We counted the observed frequency of all possible types of amino acid substitutions, in the repeat and adjacent sequence polymorphic variants, for those amino acid repeat types that were most frequently found in tandem repeats (A, E, G, L, P, S, K, Q). No strong differences were observed in the two datasets. We also counted the position of substitutions within the repeats, by assigning each substitution to one of the following classes: pos = 1 (first position of the repeat), pos = 2 (second position), pos = -1 (last position), pos = -2 (position before the last one) and middle (remainder of positions). We calculated the expected values under a random distribution; for example, in a repeat of size 5, each class will have an expected value of 0.2, and in a repeat of size 6 all classes will have an expected value of 0.16 except the middle, which will have an expected value of 0.33. The total expected values for each group were compared with the observed values using a chi-square test. No significant differences were found.</p>
			</sec>
			<sec>
				<st>
					<p>Synonymous and non-synonymous nucleotide substitutions</p>
				</st>
				<p>We counted the observed number of synonymous and non-synonymous nucleotide substitutions in the non-redundant EST dataset matching the repeats and their adjacent regions. In this case we included substitutions represented by a single EST as well as by several identical ESTs to have a sufficiently large dataset to be able to obtain and compare substitution frequencies. Some of the changes could be due to sequencing errors. This type of error should affect both synonymous and non-synonymous substitution rates inside and outside the repeats in the same manner. As we still detected differences between the two types of rates, and as our main goal was to compare different regions and types of homopeptides, we used all the observed mutations in the non-redundant EST dataset. To maximize the reliability of the alignments we discarded ESTs containing gaps (3%). The dataset comprised 8,196 different non-redundant ESTs. We counted the number of synonymous and non-synonymous positions analyzed to obtain the frequency of substitutions of each kind. Overall, we analyzed 430,161 nucleotide positions, 107,845 of which were synonymous and 322,316 non-synonymous. The total number of synonymous substitutions was 1,663 (1.54% of sites) and of non-synonymous substitutions 3,458 (1.07% of sites). We also extracted the results for sequences containing each different amino acid repeat type.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data is available with the online version of this manuscript. Additional data file <supplr sid="S1">1</supplr> contains a listing of substitution polymorphic variants within tandem amino acid repeats (subs_rep), a listing of gap polymorphic variants within tandem amino acid repeats (gaps_rep), a listing of substitution polymorphic variants in repeat adjacent sequences (subs_adj), a listing of gap polymorphic variants in repeat adjacent sequences (gaps_adj), data on observed and expected substitution positions (subs_position) and, data on synonymous and non-synonymous substitutions (nucl_subs).</p>
			<suppl id="S1">
				<title>
					<p>Additional Data File 1</p>
				</title>
				<caption>
					<p>Substitution and gap polymorphic variants within tandem amino acid repeats and in repeat adjacent sequences, as well as observed and expected substitution positions, and data on synonymous and non-synonymous substitutions.</p>
				</caption>
				<text>
					<p>In the file, subs_rep is a list of substitution polymorphic variants within tandem amino acid repeats, gaps_rep is a list of gap polymorphic variants within tandem amino acid repeats, subs_adj is a list of substitution polymorphic variants in repeat adjacent sequences, gaps_adj is a list of gap polymorphic variants in repeat adjacent sequences, subs_position contains the observed and expected substitution positions and, nucl_subs contains data on synonymous and non-synonymous substitutions.</p>
				</text>
				<file name="gb-2006-7-4-r33-S1.xls">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We acknowledge support by the program Ram&#243;n y Cajal and Fundaci&#243; ICREA (MMA) and from Universit&#225; di Bologna (LM). We are grateful to Celine Poux, Nicol&#225;s Bellora and Dom&#232;nec Farr&#233; for their useful comments. This research was funded by grants BIO2002-04426-C02-01 and BIO2003-05073 from Ministerio de Ciencia y Tecnolog&#237;a (Spain), and STAR European Project.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Amino acid runs in eukaryotic proteomes and disease associations.</p>
				</title>
				<aug>
					<au>
						<snm>Karlin</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Brocchieri</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Bergman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mrazek</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gentles</snm>
						<fnm>AJ</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>333</fpage>
				<lpage>338</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">11782551</pubid>
						<pubid idtype="doi">10.1073/pnas.93.4.1560</pubid>
						<pubid idtype="pmcid">117561</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Comparative analysis of amino acid repeats in rodents and humans.</p>
				</title>
				<aug>
					<au>
						<snm>Alb&#224;</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Guig&#243;</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>549</fpage>
				<lpage>554</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">383298</pubid>
						<pubid idtype="pmpid" link="fulltext">15059995</pubid>
						<pubid idtype="doi">10.1101/gr.1925704</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Persistence of repeated sequences that evolve by replication slippage.</p>
				</title>
				<aug>
					<au>
						<snm>Tachida</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Iizuka</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>1992</pubdate>
				<volume>131</volume>
				<fpage>471</fpage>
				<lpage>478</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">1644281</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Microsatellites within genes: structure, function and evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Li</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Korol</snm>
						<fnm>AB</fnm>
					</au>
					<au>
						<snm>Fahima</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Nevo</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2004</pubdate>
				<volume>21</volume>
				<fpage>991</fpage>
				<lpage>1007</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msh073</pubid>
						<pubid idtype="pmpid" link="fulltext">14963101</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Conservation of polyglutamine tract size between mouse and human depends on codon interruption.</p>
				</title>
				<aug>
					<au>
						<snm>Alb&#224;</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Santib&#225;&#241;ez-Koref</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Hancock</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1999</pubdate>
				<volume>16</volume>
				<fpage>1641</fpage>
				<lpage>1644</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10555295</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Simple sequence repeats as a source of quantitative genetic variation.</p>
				</title>
				<aug>
					<au>
						<snm>Kashi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>King</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Soller</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>1997</pubdate>
				<volume>13</volume>
				<fpage>74</fpage>
				<lpage>78</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(97)01008-1</pubid>
						<pubid idtype="pmpid" link="fulltext">9055609</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Molecular origins of rapid and continuous morphological evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Fondon</snm>
						<fnm>JW</fnm>
						<suf>3rd</suf>
					</au>
					<au>
						<snm>Garner</snm>
						<fnm>HR</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2004</pubdate>
				<volume>101</volume>
				<fpage>18058</fpage>
				<lpage>18063</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">539791</pubid>
						<pubid idtype="pmpid" link="fulltext">15596718</pubid>
						<pubid idtype="doi">10.1073/pnas.0408118101</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Molecular basis of genetic instability of triplet repeats.</p>
				</title>
				<aug>
					<au>
						<snm>Wells</snm>
						<fnm>RD</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1996</pubdate>
				<volume>271</volume>
				<fpage>2875</fpage>
				<lpage>2878</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8621672</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Diseases of unstable repeat expansion: mechanisms and common principles.</p>
				</title>
				<aug>
					<au>
						<snm>Gatchel</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Zoghbi</snm>
						<fnm>HY</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>743</fpage>
				<lpage>755</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">16205714</pubid>
						<pubid idtype="doi">10.1038/nrg1691</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Population variation analysis at nine loci containing expressed trinucleotide repeats.</p>
				</title>
				<aug>
					<au>
						<snm>Jodice</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Giovannone</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Calabresi</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Bellocchi</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Terrenato</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Novelletto</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Ann Hum Genet</source>
				<pubdate>1997</pubdate>
				<volume>61</volume>
				<fpage>425</fpage>
				<lpage>438</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1017/S0003480097006489</pubid>
						<pubid idtype="pmpid" link="fulltext">9459004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Dynamics of CAG repeat loci revealed by the analysis of their variability.</p>
				</title>
				<aug>
					<au>
						<snm>Andr&#233;s</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Lao</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Soldevila</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Calafell</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Bertranpetit</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Hum Mutat</source>
				<pubdate>2003</pubdate>
				<volume>21</volume>
				<fpage>61</fpage>
				<lpage>70</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12497632</pubid>
						<pubid idtype="doi">10.1002/humu.10151</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Ensembl 2005.</p>
				</title>
				<aug>
					<au>
						<snm>Hubbard</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Andrews</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Caccamo</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Cameron</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Clamp</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Clarke</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Coates</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Cox</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Cunningham</snm>
						<fnm>F</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucl Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>D447</fpage>
				<lpage>D453</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">540092</pubid>
						<pubid idtype="pmpid" link="fulltext">15608235</pubid>
						<pubid idtype="doi">10.1093/nar/gki138</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Database resources of the National Center for Biotechnology Information.</p>
				</title>
				<aug>
					<au>
						<snm>Wheeler</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Barrett</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Benson</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Bryant</snm>
						<fnm>SH</fnm>
					</au>
					<au>
						<snm>Canese</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>DiCuccio</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Edgar</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Federhen</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Helmberg</snm>
						<fnm>W</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucl Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>D39</fpage>
				<lpage>45</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">540016</pubid>
						<pubid idtype="pmpid" link="fulltext">15608222</pubid>
						<pubid idtype="doi">10.1093/nar/gki062</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Sch&#228;ffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Mining SNPs from EST databases.</p>
				</title>
				<aug>
					<au>
						<snm>Picoult-Newberg</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Ideker</snm>
						<fnm>TE</fnm>
					</au>
					<au>
						<snm>Pohl</snm>
						<fnm>MG</fnm>
					</au>
					<au>
						<snm>Taylor</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Donaldson</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Nickerson</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Boyce-Jacino</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>1999</pubdate>
				<volume>9</volume>
				<fpage>167</fpage>
				<lpage>174</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">310719</pubid>
						<pubid idtype="pmpid" link="fulltext">10022981</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Single nucleotide polymorphisms associated with rat expressed sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Guryev</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Berezikov</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Malik</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Plasterk</snm>
						<fnm>RH</fnm>
					</au>
					<au>
						<snm>Cuppen</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Guryev</snm>
						<fnm>V</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>1438</fpage>
				<lpage>1443</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">442160</pubid>
						<pubid idtype="pmpid" link="fulltext">15231757</pubid>
						<pubid idtype="doi">10.1101/gr.2154304</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Integration of the rat recombination and EST maps in the rat genomic sequence and comparative mapping analysis with the mouse genome.</p>
				</title>
				<aug>
					<au>
						<snm>Wilder</snm>
						<fnm>SP</fnm>
					</au>
					<au>
						<snm>Bihoreau</snm>
						<fnm>MT</fnm>
					</au>
					<au>
						<snm>Argoud</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Watanabe</snm>
						<fnm>TK</fnm>
					</au>
					<au>
						<snm>Lathrop</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gauguier</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>758</fpage>
				<lpage>765</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">383323</pubid>
						<pubid idtype="pmpid" link="fulltext">15060020</pubid>
						<pubid idtype="doi">10.1101/gr.2001604</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley.</p>
				</title>
				<aug>
					<au>
						<snm>La Rota</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kantety</snm>
						<fnm>RV</fnm>
					</au>
					<au>
						<snm>Yu</snm>
						<fnm>JK</fnm>
					</au>
					<au>
						<snm>Sorrells</snm>
						<fnm>ME</fnm>
					</au>
				</aug>
				<source>BMC Genomics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>23</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">550658</pubid>
						<pubid idtype="pmpid" link="fulltext">15720707</pubid>
						<pubid idtype="doi">10.1186/1471-2164-6-23</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Tandem repeat copy-number variation in protein-coding regions of human genes.</p>
				</title>
				<aug>
					<au>
						<snm>O'Dushlaine</snm>
						<fnm>CT</fnm>
					</au>
					<au>
						<snm>Edwards</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Park</snm>
						<fnm>SD</fnm>
					</au>
					<au>
						<snm>Shields</snm>
						<fnm>DC</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>R69</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1273636</pubid>
						<pubid idtype="pmpid" link="fulltext">16086851</pubid>
						<pubid idtype="doi">10.1186/gb-2005-6-8-r69</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Rate and directionality of mutations and effects of allele size constraints at anonymous, gene-associated, and disease-causing trinucleotide loci.</p>
				</title>
				<aug>
					<au>
						<snm>Deka</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Guangyn</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Smelser</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Zhong</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kimmel</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Chakraborty</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1999</pubdate>
				<volume>16</volume>
				<fpage>1166</fpage>
				<lpage>1177</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10486972</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Repeat polymorphisms within gene regions: phenotypic and evolutionary implications.</p>
				</title>
				<aug>
					<au>
						<snm>Wren</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Forgacs</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Fondon</snm>
						<fnm>JW</fnm>
						<suf>3rd</suf>
					</au>
					<au>
						<snm>Pertsemlidis</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Cheng</snm>
						<fnm>SY</fnm>
					</au>
					<au>
						<snm>Gallardo</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>RS</fnm>
					</au>
					<au>
						<snm>Shohet</snm>
						<fnm>RV</fnm>
					</au>
					<au>
						<snm>Minna</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Garner</snm>
						<fnm>HR</fnm>
					</au>
				</aug>
				<source>Am J Hum Genet</source>
				<pubdate>2000</pubdate>
				<volume>67</volume>
				<fpage>345</fpage>
				<lpage>356</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1287183</pubid>
						<pubid idtype="pmpid" link="fulltext">10889045</pubid>
						<pubid idtype="doi">10.1086/303013</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains.</p>
				</title>
				<aug>
					<au>
						<snm>Lavoie</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Debeane</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Trinh</snm>
						<fnm>QD</fnm>
					</au>
					<au>
						<snm>Turcotte</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Corbeil-Girard</snm>
						<fnm>LP</fnm>
					</au>
					<au>
						<snm>Dicaire</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Saint-Denis</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Page</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rouleau</snm>
						<fnm>GA</fnm>
					</au>
					<au>
						<snm>Brais</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Hum Mol Genet</source>
				<pubdate>2003</pubdate>
				<volume>12</volume>
				<fpage>2967</fpage>
				<lpage>2979</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/hmg/ddg329</pubid>
						<pubid idtype="pmpid" link="fulltext">14519685</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations.</p>
				</title>
				<aug>
					<au>
						<snm>Kruglyak</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Durrett</snm>
						<fnm>RT</fnm>
					</au>
					<au>
						<snm>Schug</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Aquadro</snm>
						<fnm>CF</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>10774</fpage>
				<lpage>10778</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">27971</pubid>
						<pubid idtype="pmpid" link="fulltext">9724780</pubid>
						<pubid idtype="doi">10.1073/pnas.95.18.10774</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>A relationship between lengths of microsatellites and nearby substutitution rates in mammalian genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Santib&#225;&#241;ez-Koref</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Gangeswaran</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Hancock</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2001</pubdate>
				<volume>18</volume>
				<fpage>2119</fpage>
				<lpage>2123</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11606708</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>The effect of FMR1 CGG repeat interruptions on mutation frequency as measured by sperm typing.</p>
				</title>
				<aug>
					<au>
						<snm>Kunst</snm>
						<fnm>CB</fnm>
					</au>
					<au>
						<snm>Leeflang</snm>
						<fnm>EP</fnm>
					</au>
					<au>
						<snm>Iber</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Arnheim</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Warren</snm>
						<fnm>ST</fnm>
					</au>
				</aug>
				<source>J Med Genet</source>
				<pubdate>1997</pubdate>
				<volume>34</volume>
				<fpage>627</fpage>
				<lpage>631</lpage>
				<xrefbib>
					<pubid idtype="pmpid">9279752</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Polyalanine expansion in synpolydactyly might result from unequal crossing-over of HOXD13.</p>
				</title>
				<aug>
					<au>
						<snm>Warren</snm>
						<fnm>ST</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1997</pubdate>
				<volume>275</volume>
				<fpage>408</fpage>
				<lpage>409</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.275.5298.408</pubid>
						<pubid idtype="pmpid" link="fulltext">9005557</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>A novel stable polyalanine [poly(A)] expansion in the HOXA13 gene associated with hand-foot-genital syndrome: proper function of poly(A)-harbouring transcription factors depends on a critical repeat length?.</p>
				</title>
				<aug>
					<au>
						<snm>Utsch</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Becker</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Brock</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Lentze</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Bidlingmaier</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Ludwig</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Hum Genet</source>
				<pubdate>2002</pubdate>
				<volume>110</volume>
				<fpage>488</fpage>
				<lpage>494</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s00439-002-0712-8</pubid>
						<pubid idtype="pmpid" link="fulltext">12073020</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>The importance of being proline: the interaction of proline-rich motifs in signaling proteins with their cognate domains.</p>
				</title>
				<aug>
					<au>
						<snm>Kay</snm>
						<fnm>BK</fnm>
					</au>
					<au>
						<snm>Williamson</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Sudol</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>FASEB J</source>
				<pubdate>2000</pubdate>
				<volume>14</volume>
				<fpage>231</fpage>
				<lpage>241</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10657980</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Conservation of human disease genes in the rat genome.</p>
				</title>
				<aug>
					<au>
						<snm>Huang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Winter</snm>
						<fnm>EE</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Weinstock</snm>
						<fnm>KG</fnm>
					</au>
					<au>
						<snm>Xing</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Goodstadt</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Stenson</snm>
						<fnm>PD</fnm>
					</au>
					<au>
						<snm>Cooper</snm>
						<fnm>DN</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Alb&#224;</snm>
						<fnm>MM</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R47</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">463309</pubid>
						<pubid idtype="pmpid" link="fulltext">15239832</pubid>
						<pubid idtype="doi">10.1186/gb-2004-5-7-r47</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Genome sequence of the Brown Norway rat yields insights into mammalian evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Gibbs</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Weinstock</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Metzker</snm>
						<fnm>ML</fnm>
					</au>
					<au>
						<snm>Muzny</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Sodergren</snm>
						<fnm>EJ</fnm>
					</au>
					<au>
						<snm>Scherer</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Scott</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Steffen</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Worley</snm>
						<fnm>KC</fnm>
					</au>
					<au>
						<snm>Burch</snm>
						<fnm>PE</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2004</pubdate>
				<volume>428</volume>
				<fpage>493</fpage>
				<lpage>521</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature02426</pubid>
						<pubid idtype="pmpid" link="fulltext">15057822</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Involvemen of leucine residues at positions 107, 112, and 115 in a leucine-rich repeat motif of human Toll-like receptor 2 in the recognition of diacylated lipoproteins and lipopeptides and <it>Staphylococcus aureus</it> peptidoglycans.</p>
				</title>
				<aug>
					<au>
						<snm>Fujita</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Into</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Yasuda</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Okusawa</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hamahira</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kuroki</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Eto</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Nisizawa</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Shibata</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>J Immunol</source>
				<pubdate>2003</pubdate>
				<volume>171</volume>
				<fpage>3675</fpage>
				<lpage>3683</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">14500665</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Initial sequencing and comparative analysis of the mouse genome.</p>
				</title>
				<aug>
					<au>
						<cnm>Mouse Genome Sequencing Consortium</cnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>420</volume>
				<fpage>520</fpage>
				<lpage>562</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01262</pubid>
						<pubid idtype="pmpid" link="fulltext">12466850</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Time dependency of molecular rate estimates and systematic overestimation of recent divergence times.</p>
				</title>
				<aug>
					<au>
						<snm>Ho</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Phillips</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Cooper</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Drummond</snm>
						<fnm>AJ</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2005</pubdate>
				<volume>22</volume>
				<fpage>1561</fpage>
				<lpage>1568</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msi145</pubid>
						<pubid idtype="pmpid" link="fulltext">15814826</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Relativity for molecular clocks.</p>
				</title>
				<aug>
					<au>
						<snm>Penny</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2005</pubdate>
				<volume>436</volume>
				<fpage>183</fpage>
				<lpage>184</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/436183a</pubid>
						<pubid idtype="pmpid" link="fulltext">16015312</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in human and mice.</p>
				</title>
				<aug>
					<au>
						<snm>Hancock</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Worthey</snm>
						<fnm>EA</fnm>
					</au>
					<au>
						<snm>Santib&#225;&#241;ez-Koref</snm>
						<fnm>MF</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2001</pubdate>
				<volume>18</volume>
				<fpage>1014</fpage>
				<lpage>1023</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11371590</pubid>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
