<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2002-3-10-research0054</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Genomic analysis of membrane protein families: abundance and conserved motifs</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Liu</snm>
               <fnm>Yang</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Engelman</snm>
               <mi>M</mi>
               <fnm>Donald</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3" ca="yes">
               <snm>Gerstein</snm>
               <fnm>Mark</fnm>
               <insr iid="I1"/>
               <email>Mark.Gerstein@yale.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2002</pubdate>
         <volume>3</volume>
         <issue>10</issue>
         <fpage>research0054.1</fpage>
         <lpage>research0054.12</lpage>
         <url>http://genomebiology.com/2002/3/10/research/0054</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2002-3-10-research0054</pubid>
               <pubid idtype="pmpid">12372142</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>21</day>
               <month>5</month>
               <year>2002</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>26</day>
               <month>7</month>
               <year>2002</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>7</day>
               <month>8</month>
               <year>2002</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>19</day>
               <month>9</month>
               <year>2002</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2002</year>
         <collab>Liu et al., licensee BioMed Central Ltd</collab>
      </cpyrt>
      <shorttitle>
         <p>Genomic analysis of membrane protein families: abundance and conserved motifs</p>
      </shorttitle>
      <shortabs>
         <p>A genome-wide analysis was carried out on patterns of the classified polytopic membrane protein families, and the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families was also analyzed.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. <it>Caenorhabditis elegans</it> is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of <it>C. elegans</it>, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010001">Biochemistry and structural biology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010004">Cell biology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Genome-wide structural analyses in terms of patterns of protein folding have been useful in revealing functional and evolutionary relationships [<abbr bid="B1">1</abbr>,<abbr bid="B2">2</abbr>,<abbr bid="B3">3</abbr>,<abbr bid="B4">4</abbr>]. Given the abundance of membrane proteins, it would be highly desirable to have a similar analysis for this major category of structures; however, the number of known membrane protein structures remains small. Here we exploit the fact that membrane proteins can be classified into families on the basis of sequence similarities and topology, and use the family groupings to analyze genomic characteristics of membrane protein families.</p>
         <p>Most transmembrane proteins are formed from bundles of helices that traverse the membrane lipid bilayer. It is estimated that 20-30% of the proteins in known genomes are of this type [<abbr bid="B3">3</abbr>,<abbr bid="B4">4</abbr>,<abbr bid="B5">5</abbr>,<abbr bid="B6">6</abbr>]. The most general description of the transmembrane helical regions (TMs) is that they comprise a region of 18 or more amino acids with a largely hydrophobic character. This sequence feature can be identified in primary sequences using hydrophobicity scales [<abbr bid="B7">7</abbr>,<abbr bid="B8">8</abbr>,<abbr bid="B9">9</abbr>]. The most abundant amino acids in transmembrane regions are leucine, isoleucine, valine, phenylalanine, alanine, glycine, serine, and threonine. Taken together, these amino acids account for 75% of the amino acids in transmembrane regions [<abbr bid="B10">10</abbr>,<abbr bid="B11">11</abbr>,<abbr bid="B12">12</abbr>]. Analysis of the distribution of amino acids has revealed patterns in TM regions, for example GxxxG, which are thought to be important in helix-helix interactions [<abbr bid="B11">11</abbr>,<abbr bid="B12">12</abbr>,<abbr bid="B13">13</abbr>,<abbr bid="B14">14</abbr>].</p>
         <p>We took advantage of the classification of protein domains provided by others (Pfam-A and Pfam-B) [<abbr bid="B15">15</abbr>], to identify families that appear to be polytopic membrane proteins, and augmented these lists with additional family members based on amino-acid sequence comparisons. Furthermore, we identified additional families on the basis of clustering of amino-acid sequences, resulting in 637 distinct families. We used these families to analyze amino-acid compositions in the helical regions, pair motifs, domain structures, and patterns of families, and arrive at a number of generalizations. Among these are that glycine, tyrosine, and proline appear frequently in conserved locations within family transmembrane helices and that the specific pair motifs are found in families that seem to be transporters, symporters, and channels. The number of kinds of domains and families seems to increase with the number of open reading frames (ORFs) in most genomes. Here we present our analysis and discuss these findings.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Classification of polytopic membrane protein domains</p>
            </st>
            <p>The procedure used to classify polytopic membrane domains is based mainly on family classification schemes (Pfam-A and Pfam-B) and is shown in Figure <figr fid="F1">1a</figr>. We identified families of polytopic membrane domains in Pfam [<abbr bid="B15">15</abbr>] by allocating TM-helices annotated in SWISS-PROT [<abbr bid="B16">16</abbr>] to proteins in Pfam. After conservatively picking 183 Pfam-A and 152 Pfam-B families, we conducted an analysis of loops that connect TM-helices. It was shown that the loops tend to be short, with most of them (> 95%) having fewer than 80 amino acids. We therefore took 80 residues as the maximal intra-domain loop between TM-helices to define polytopic membrane domains. Though the 80-residue cutoff may not apply to a small portion (around 5%) of integral membrane proteins, it diminished the chance of including soluble domains within membrane domains, given that the average soluble domain has about 170 residues [<abbr bid="B17">17</abbr>].</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Classification of polytopic membrane domains</p>
               </caption>
               <text>
                  <p>Classification of polytopic membrane domains. <b>(a)</b> Procedure for classifying polytopic membrane domains. Through automatic classification and manual examination, 228 Pfam-A, 299 Pfam-B and 121 clustered families were classified. <b>(b)</b> An example profile (PF01618) of a classified family of polytopic membrane domains consists of (from top to bottom): sequence alignment; an averaged hydrophobicity plot based on GES hydrophobicity value; consensus sequence displayed by sequence logo with conserved residues in hydrophobic regions highlighted; consensus sequences of TM-helices, where only conserved amino acids are shown in the single-letter code (with the remainder represented by "x").</p>
               </text>
               <graphic file="gb-2002-3-10-research0054-1"/>
            </fig>
            <p>Using TMHMM, a membrane protein prediction program based on a hidden Markov model [<abbr bid="B6">6</abbr>], TM-helices of membrane proteins in 26 genomes were predicted. Polytopic membrane domains were identified using the loop size between TM-helices as a guide. These domains were then classified into 231 Pfam-A and 318 Pfam-B families either by direct SWISS-PROT ID matching or by sequence similarity matching using FASTA [<abbr bid="B18">18</abbr>]. Of the aligned domains, most of their TM-helices also aligned well, especially in Pfam-A families, which have alignments based on manually crafted hidden Markov models. Unclassified domains were clustered into 121 families by their sequence similarities. For each family, a profile was constructed, as shown in Figure <figr fid="F1">1b</figr>. This included: an averaged hydrophobicity plot of all members in the family based on the Goldman-Engelman-Steitz (GES) scale [<abbr bid="B8">8</abbr>]; a consensus sequence of the family, represented by a sequence logo plot [<abbr bid="B19">19</abbr>]; and consensus sequences of the TM-helices. By analyzing the hydrophobicity plots, we can locate TM-helices in the aligned sequences in protein families, and assign a number of TM-helices to each family. Some families, including 3 in Pfam-A and 20 in Pfam-B, were eliminated at this step, owing to the ambiguity of TM-helices observed in the plot. From this process, we identified 228 Pfam-A, 298 Pfam-B and 121 clustered families for our analyses, with approximately 95% domains classified in Pfam families.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the number of TM-helices in Pfam-A families of polytopic membrane domains</p>
            </st>
            <p>After assigning a number of TM-helices to each family, we conducted a survey of the assigned numbers of TM-helices in 228 Pfam-A families of polytopic membrane domains (Figure <figr fid="F2">2</figr>). Pfam-A families are manually classified families that have well-aligned protein domains, and most of them have a well-defined number for TM-helices. We also picked families in solute transport systems that are annotated as transporters, symporters and channels, and analyzed the number of TM-helices for these families (Figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Number of TM-helices in Pfam-A families of polytopic membrane domains</p>
               </caption>
               <text>
                  <p>Number of TM-helices in Pfam-A families of polytopic membrane domains. Shown are the number of Pfam-A families of polytopic membrane domains with a given number of TM-helices. Only families with more than 20 members were counted. The green bars indicate numbers from all studied Pfam-A families and the yellow bars those from the Pfam-A families that are annotated as transporters, symporters, and channels.</p>
               </text>
               <graphic file="gb-2002-3-10-research0054-2"/>
            </fig>
            <p>In general, most Pfam-A families tend to have a small number of TM-helices. For those with seven or fewer TM-helices, the number of families does not vary significantly with helix number, although there are more families with two or four TM-helices than with three, five, six, or seven. For families with more than seven TM-helices, the number of families decreases sharply as the number of TM-helices increases. Families with 12 TM-helices are the exception, however; they have a small peak in numbers against the overall downward slope of the plot. We also carried out the same kind of analysis on Pfam-A families that are annotated as transporters, symporters, and channels, and found that 12-TM-helix families are preferred by transporter-like families. In addition, most (11 out of 12) Pfam-A families with 12 TM-helices are transporter-like families. There seems to be a tendency for the transporter-like families to have an even number of TM-helices, because families with 2, 4, 6, 8, and 12 TM-helices have a relatively higher occurrence than those with a neighboring odd number of TM-helices.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of amino-acid distribution and pair motifs</p>
            </st>
            <p>We selected 168 families from Pfam-A that had more than 20 members. For each of these families, we then generated consensus sequences with conservation value (R<sub>sequence</sub>) using the Alpro program [<abbr bid="B19">19</abbr>]. Relatively conserved amino acids in the consensus sequences (R<sub>sequence</sub> value > 3.0, representing the top 15% R<sub>sequence</sub> value of all amino acids) and in TM-helical regions were analyzed for their composition as well as for pair motifs.</p>
            <p>We compared the amino-acid composition of the TM-helices in general with the composition of only the conserved positions in TM-helices in the 168 families (Figure <figr fid="F3">3</figr>). We noticed that some amino acids are considerably more prevalent in the conserved positions, such as glycine (8% average composition in TM-helices versus 19% composition in conserved positions of TM-helices), proline (4% versus 9%) and tyrosine (3% versus 5%). In contrast, isoleucine (10% versus 4%), valine (8% versus 4%), methionine (4% versus 1%) and threonine (7% versus 4%) are less prevalent in conserved positions.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Amino-acid compositions of TM-helices</p>
               </caption>
               <text>
                  <p>Amino-acid compositions of TM-helices. The amino-acid composition in the TM-helical regions <b>(a)</b> for all sequences and of consensus sequences, and <b>(b)</b> for the 168 Pfam-A families of polytopic membrane domains that contain more than 20 members.</p>
               </text>
               <graphic file="gb-2002-3-10-research0054-3"/>
            </fig>
            <p>As might be expected, the changes in prevalence of certain amino acids reflect their conservation in the consensus sequence. Therefore, glycine, proline and tyrosine are relatively conserved residues in TM-helical regions, and isoleucine, valine, methionine and threonine have relatively high mutability. This result correlates very well with the mutation data matrix (MDM) for multi-spanning transmembrane regions in membrane proteins [<abbr bid="B10">10</abbr>]. In the MDM of multi-spanning transmembrane &#945; helices, isoleucine, methionine and valine are found to have relatively high mutability as hydrophobic residues, and serine and threonine also rank high in mutability as polar residues. In the matrix, proline appears to be highly conserved. Our results confirm these findings; in addition, we find that glycine and tyrosine are also highly conserved residues in polytopic TM-helices.</p>
            <p>We also analyzed the consensus sequences of 168 Pfam-A families for significant amino-acid pair motifs and compared our findings with previous studies. Table <tblr tid="T1">1</tblr> shows three pair lists: one includes the top 50 pairs of Senes <it>et al.</it> with their significance [<abbr bid="B12">12</abbr>]; the second includes the top 50 pairs with their occurrences from randomly generated pairs; and the third includes the top 50 pairs with their occurrences using Senes <it>et al</it>.'s top 200 most significant pairs. Of the three lists, the GxxxG pair always ranks first, highlighting its significance in TM-helices [<abbr bid="B12">12</abbr>,<abbr bid="B13">13</abbr>,<abbr bid="B14">14</abbr>]. In the last list, which contains top-ranked pairs in the first two lists, we observed some interesting pair-motif patterns that are associated with glycine. Amino-acid pairs such as ZxxxZ and ZxxxxxxZ (Z represents glycine, alanine, or serine - residues with a small side chain) are highly ranked in the last list. It is known that amino acids are positioned with an average of 3.6 residues per turn in TM-helices [<abbr bid="B20">20</abbr>]. Two residues that are separated by three or six residues are thus oriented in the same direction. Therefore, it was suggested that these motifs are favored for TM-helix packing [<abbr bid="B12">12</abbr>,<abbr bid="B14">14</abbr>]. Our results are in good agreement with the pair motifs that are formed with small residues, but do not favor pairs with &#946;-branched aliphatic residues (isoleucine and valine). This is probably because isoleucine and valine are highly mutable residues in TM-helices.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Top amino-acid pairs in transmembrane helices of the consensus sequences of classified Pfam-A families</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c cspan="2" ca="left">
                        <p>List 1: top 50 pairs and their significance from Senes <it>et al</it>. [<abbr bid="B12">12</abbr>]</p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>List 2: top 50 pairs and their occurrences from random pairs</p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>List 3: top 50 pairs and their occurrences in lists 1 and 2</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>GG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>6.35 &#215; 10<sup>-34</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>II4</p>
                     </c>
                     <c ca="left">
                        <p>8.36 &#215; 10<sup>-24</sup></p>
                     </c>
                     <c ca="left">
                        <p>GG3</p>
                     </c>
                     <c ca="left">
                        <p>32</p>
                     </c>
                     <c ca="left">
                        <p>GL3</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>GA4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>3.61 &#215; 10<sup>-21</sup></p>
                     </c>
                     <c ca="left">
                        <p>GG1</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GG7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IG1</p>
                     </c>
                     <c ca="left">
                        <p>4.79 &#215; 10<sup>-21</sup></p>
                     </c>
                     <c ca="left">
                        <p>GG2</p>
                     </c>
                     <c ca="left">
                        <p>29</p>
                     </c>
                     <c ca="left">
                        <p>GL1</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IG2</p>
                     </c>
                     <c ca="left">
                        <p>1.29 &#215; 10<sup>-16</sup></p>
                     </c>
                     <c ca="left">
                        <p>GL3</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AG7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VG2</p>
                     </c>
                     <c ca="left">
                        <p>5.73 &#215; 10<sup>-16</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL1</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GA7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IV4</p>
                     </c>
                     <c ca="left">
                        <p>2.12 &#215; 10<sup>-15</sup></p>
                     </c>
                     <c ca="left">
                        <p>LG2</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IP1</p>
                     </c>
                     <c ca="left">
                        <p>4.52 &#215; 10<sup>-15</sup></p>
                     </c>
                     <c ca="left">
                        <p>GF4</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>PL2</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VV4</p>
                     </c>
                     <c ca="left">
                        <p>3.75 &#215; 10<sup>-14</sup></p>
                     </c>
                     <c ca="left">
                        <p>FL3</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AS4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VI4</p>
                     </c>
                     <c ca="left">
                        <p>1.09 &#215; 10<sup>-12</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL7</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>AL6</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AV1</p>
                     </c>
                     <c ca="left">
                        <p>2.17 &#215; 10<sup>-12</sup></p>
                     </c>
                     <c ca="left">
                        <p>GL4</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>LP1</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GL3</p>
                     </c>
                     <c ca="left">
                        <p>9.69 &#215; 10<sup>-12</sup></p>
                     </c>
                     <c ca="left">
                        <p>GG6</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>PG9</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>AG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>9.06 &#215; 10<sup>-10</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL5</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GA4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>WQ1</p>
                     </c>
                     <c ca="left">
                        <p>3.87 &#215; 10<sup>-09</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL3</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>FG1</p>
                     </c>
                     <c ca="left">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IL4</p>
                     </c>
                     <c ca="left">
                        <p>4.89 &#215; 10<sup>-09</sup></p>
                     </c>
                     <c ca="left">
                        <p>LG3</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>SL1</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AA3</p>
                     </c>
                     <c ca="left">
                        <p>1.33 &#215; 10<sup>-08</sup></p>
                     </c>
                     <c ca="left">
                        <p>LG6</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>SG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VG1</p>
                     </c>
                     <c ca="left">
                        <p>1.83 &#215; 10<sup>-08</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL8</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>PL1</p>
                     </c>
                     <c ca="left">
                        <p>14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>GG7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>2.95 &#215; 10<sup>-08</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GG7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AA7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VL4</p>
                     </c>
                     <c ca="left">
                        <p>7.71 &#215; 10<sup>-08</sup></p>
                     </c>
                     <c ca="left">
                        <p>GA1</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>AG5</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IS2</p>
                     </c>
                     <c ca="left">
                        <p>8.98 &#215; 10<sup>-08</sup></p>
                     </c>
                     <c ca="left">
                        <p>LG10</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>LF8</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SI2</p>
                     </c>
                     <c ca="left">
                        <p>1.52 &#215; 10<sup>-07</sup></p>
                     </c>
                     <c ca="left">
                        <p>GG8</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>IA1</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GI1</p>
                     </c>
                     <c ca="left">
                        <p>2.93 &#215; 10<sup>-07</sup></p>
                     </c>
                     <c ca="left">
                        <p>LA1</p>
                     </c>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>GV1</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IY10</p>
                     </c>
                     <c ca="left">
                        <p>4.55 &#215; 10<sup>-07</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL2</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>AI1</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>YY3</p>
                     </c>
                     <c ca="left">
                        <p>6.3 &#215; 10<sup>-07</sup></p>
                     </c>
                     <c ca="left">
                        <p>FG7</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>AA2</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IF10</p>
                     </c>
                     <c ca="left">
                        <p>1.63 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>FL1</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>GL2</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GI2</p>
                     </c>
                     <c ca="left">
                        <p>3.27 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>LG4</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>AA3</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PI3</p>
                     </c>
                     <c ca="left">
                        <p>3.99 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>GA3</p>
                     </c>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>SL2</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PV1</p>
                     </c>
                     <c ca="left">
                        <p>4.97 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>FG4</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>PG5</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PL1</p>
                     </c>
                     <c ca="left">
                        <p>5.35 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>GG5</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>PG6</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LP1</p>
                     </c>
                     <c ca="left">
                        <p>5.35 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>GL7</p>
                     </c>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>IL4</p>
                     </c>
                     <c ca="left">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CG4</p>
                     </c>
                     <c ca="left">
                        <p>5.4 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>GL1</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>GS5</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VY9</p>
                     </c>
                     <c ca="left">
                        <p>5.58 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AG7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>VL4</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GV2</p>
                     </c>
                     <c ca="left">
                        <p>6.04 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>FG8</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>GV2</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VP1</p>
                     </c>
                     <c ca="left">
                        <p>7.45 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>LL4</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>IG1</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IA1</p>
                     </c>
                     <c ca="left">
                        <p>7.93 &#215; 10<sup>-06</sup></p>
                     </c>
                     <c ca="left">
                        <p>GV3</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>PG10</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PL2</p>
                     </c>
                     <c ca="left">
                        <p>1.13 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>AG3</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>LY6</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GN4</p>
                     </c>
                     <c ca="left">
                        <p>1.38 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>GF1</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>LF10</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GS5</p>
                     </c>
                     <c ca="left">
                        <p>1.43 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>LA2</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>SA6</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VA2</p>
                     </c>
                     <c ca="left">
                        <p>2.51 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>AG1</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>LG5</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HQ1</p>
                     </c>
                     <c ca="left">
                        <p>2.7 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>FL5</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>SA3</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VY10</p>
                     </c>
                     <c ca="left">
                        <p>2.95 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AG4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>PF1</p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IQ2</p>
                     </c>
                     <c ca="left">
                        <p>3.1 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>FG5</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GS4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LN2</p>
                     </c>
                     <c ca="left">
                        <p>5.74 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>FF1</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>IV4</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM9</p>
                     </c>
                     <c ca="left">
                        <p>6.84 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>GA7</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>LS1</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PA9</p>
                     </c>
                     <c ca="left">
                        <p>8.25 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>FG2</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>GY8</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VC5</p>
                     </c>
                     <c ca="left">
                        <p>9.87 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>AF3</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>IG2</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>QD3</p>
                     </c>
                     <c ca="left">
                        <p>9.95 &#215; 10<sup>-05</sup></p>
                     </c>
                     <c ca="left">
                        <p>GP2</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>LF9</p>
                     </c>
                     <c ca="left">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LY10</p>
                     </c>
                     <c ca="left">
                        <p>1.19 &#215; 10<sup>-04</sup></p>
                     </c>
                     <c ca="left">
                        <p>PL2</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>VF8</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SV2</p>
                     </c>
                     <c ca="left">
                        <p>1.24 &#215; 10<sup>-04</sup></p>
                     </c>
                     <c ca="left">
                        <p>FF5</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>VG6</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DE4</p>
                     </c>
                     <c ca="left">
                        <p>1.51 &#215; 10<sup>-04</sup></p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>AS4</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>GN4</p>
                     </c>
                     <c ca="left">
                        <p>8</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>A pair XY<it>n</it> corresponds to amino acids X and Y separated by (<it>n</it>-1) residues. List 1 shows the top 50 amino-acid pairs and their significances by the TMSTAT method [<abbr bid="B12">12</abbr>]; list 2 shows the top 50 amino-acid pairs generated from random amino-acid pairs and their occurrences in the consensus sequences of Pfam-A families of polytopic membrane domains; and list 3 shows the top 50 amino-acid pairs generated from the intersection of lists 1 and 2 (that is, the top 200 pairs as judged by TMSTAT and their occurrences in the consensus sequences of Pfam-A families of polytopic membrane domains). Pairs of small-side-chain amino acids, such as GG4 and AS7, are in bold.</p>
               </tblfn>
            </tbl>
            <p>Of all the 168 Pfam-A families of polytopic membrane domains we studied, 45 are classified as transporters, channels, and symporters, representing 27% of the total families. We studied GxxxG and GxxxxxxG pairs, and found that they tend to be associated within transporter/channel-like membrane proteins (Table <tblr tid="T2">2</tblr>). When one or both glycines is mutated to a small residue such as serine or alanine, this association is weakened. Therefore, GxxxG and GxxxxxxG pairs are relatively conserved in transporter/channel-like membrane proteins. By comparing the amino-acid composition of conserved residues in the TM-helices of the transporter-like families with that of the rest of the Pfam-A families (Table <tblr tid="T3">3</tblr>), we found that glycine is two times more conserved in the transporter-like families, reflecting the favored GxxxG and GxxxxxxG pairs in these families. Proline and asparagine are also among the conserved residues favored in transporter-like families, whereas cysteine, histidine, isoleucine, leucine, methionine, and valine are unfavored.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Association of GG4 and GG7 pairs with Pfam-A families annotated as transporters, symporters, and channels</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Pairs</p>
                     </c>
                     <c ca="center">
                        <p>Pfam-A families as transporter/ symporter/channel</p>
                     </c>
                     <c ca="center">
                        <p>All Pfam-A families</p>
                     </c>
                     <c ca="center">
                        <p>Percentage (%)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GG4</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>47.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GA4 AG4 AA4</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>30.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GS4 SG4 SS4</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GG7</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>43.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GA7 AG7 AA7</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>27.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GS7 SG7 SS7</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>27.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>All pairs</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                     <c ca="center">
                        <p>168</p>
                     </c>
                     <c ca="center">
                        <p>26.7</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>A pair XY<it>n</it> corresponds to amino acids X and Y separated by (<it>n</it>-1) residues.</p>
               </tblfn>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>A comparison between amino-acid composition of the conserved residues in the TM-helices of 45 transporter Pfam-A families and that of the other 123 Pfam-A families</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Amino acid</p>
                     </c>
                     <c ca="center">
                        <p>Conserved residues in TMs of transporter families (%)</p>
                     </c>
                     <c ca="center">
                        <p>Conserved residues in TMs of the other families (%)</p>
                     </c>
                     <c ca="center">
                        <p>Ratio</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>G</p>
                     </c>
                     <c ca="center">
                        <p>31.4</p>
                     </c>
                     <c ca="center">
                        <p>15.6</p>
                     </c>
                     <c ca="center">
                        <p>2.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>3.2</p>
                     </c>
                     <c ca="center">
                        <p>2.5</p>
                     </c>
                     <c ca="center">
                        <p>1.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P</p>
                     </c>
                     <c ca="center">
                        <p>10.3</p>
                     </c>
                     <c ca="center">
                        <p>8.0</p>
                     </c>
                     <c ca="center">
                        <p>1.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D</p>
                     </c>
                     <c ca="center">
                        <p>2.3</p>
                     </c>
                     <c ca="center">
                        <p>1.9</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>R</p>
                     </c>
                     <c ca="center">
                        <p>1.8</p>
                     </c>
                     <c ca="center">
                        <p>1.5</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A</p>
                     </c>
                     <c ca="center">
                        <p>8.6</p>
                     </c>
                     <c ca="center">
                        <p>7.7</p>
                     </c>
                     <c ca="center">
                        <p>1.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Q</p>
                     </c>
                     <c ca="center">
                        <p>2.3</p>
                     </c>
                     <c ca="center">
                        <p>2.1</p>
                     </c>
                     <c ca="center">
                        <p>1.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>T</p>
                     </c>
                     <c ca="center">
                        <p>3.9</p>
                     </c>
                     <c ca="center">
                        <p>4.0</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>W</p>
                     </c>
                     <c ca="center">
                        <p>3.6</p>
                     </c>
                     <c ca="center">
                        <p>3.8</p>
                     </c>
                     <c ca="center">
                        <p>0.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>E</p>
                     </c>
                     <c ca="center">
                        <p>1.9</p>
                     </c>
                     <c ca="center">
                        <p>2.1</p>
                     </c>
                     <c ca="center">
                        <p>0.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>S</p>
                     </c>
                     <c ca="center">
                        <p>4.4</p>
                     </c>
                     <c ca="center">
                        <p>5.4</p>
                     </c>
                     <c ca="center">
                        <p>0.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>F</p>
                     </c>
                     <c ca="center">
                        <p>7.5</p>
                     </c>
                     <c ca="center">
                        <p>9.6</p>
                     </c>
                     <c ca="center">
                        <p>0.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>K</p>
                     </c>
                     <c ca="center">
                        <p>0.9</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                     <c ca="center">
                        <p>0.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Y</p>
                     </c>
                     <c ca="center">
                        <p>3.5</p>
                     </c>
                     <c ca="center">
                        <p>5.1</p>
                     </c>
                     <c ca="center">
                        <p>0.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>L</p>
                     </c>
                     <c ca="center">
                        <p>8.4</p>
                     </c>
                     <c ca="center">
                        <p>13.1</p>
                     </c>
                     <c ca="center">
                        <p>0.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>V</p>
                     </c>
                     <c ca="center">
                        <p>2.3</p>
                     </c>
                     <c ca="center">
                        <p>4.6</p>
                     </c>
                     <c ca="center">
                        <p>0.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>M</p>
                     </c>
                     <c ca="center">
                        <p>0.6</p>
                     </c>
                     <c ca="center">
                        <p>1.6</p>
                     </c>
                     <c ca="center">
                        <p>0.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>I</p>
                     </c>
                     <c ca="center">
                        <p>1.9</p>
                     </c>
                     <c ca="center">
                        <p>5.2</p>
                     </c>
                     <c ca="center">
                        <p>0.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H</p>
                     </c>
                     <c ca="center">
                        <p>0.9</p>
                     </c>
                     <c ca="center">
                        <p>3.6</p>
                     </c>
                     <c ca="center">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>0.1</p>
                     </c>
                     <c ca="center">
                        <p>1.6</p>
                     </c>
                     <c ca="center">
                        <p>0.1</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Amino-acid composition sorted by ratio of composition in transporter families over that in the other families.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Genome-wide analysis of families of polytopic membrane domains</p>
            </st>
            <p>Classified polytopic membrane protein domains represent from 40% to 81% of the total polytopic membrane domains in the genomes studied, with an average coverage of 61% (Figure <figr fid="F4">4a</figr>). We kept the family classification relatively conservative instead of aiming for a high overall coverage with a less careful classification. To avoid including falsely predicted families, we based our analysis on families with no fewer than four members. However, a higher proportion of polytopic membrane domains could be classified if smaller families were considered (Figure <figr fid="F4">4a</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Classified polytopic membrane domains in 26 genomes</p>
               </caption>
               <text>
                  <p>Classified polytopic membrane domains in 26 genomes. <b>(a)</b> The dark-green bars represent the percentage of polytopic membrane domains that are classified in each genome, using only classified families with at least four members. When classified families containing two or three members are included in this analysis, the additional coverage is represented by light-green bars. <b>(b)</b> The proportion of polytopic membrane domains classified by different methods in all genomes studied. Most polytopic membrane domains are identified by direct ID match and sequence-similarity (FASTA) match to members of classified Pfam-A families (green and light-green bars) and Pfam-B families (yellow and light-yellow bars). A small proportion of polytopic membrane domains are clustered on the basis of their sequence similarity (gray bars). For abbreviations for genomes, see Materials and methods.</p>
               </text>
               <graphic file="gb-2002-3-10-research0054-4"/>
            </fig>
            <p>We classified polytopic membrane domains into Pfam-A, Pfam-B and self-clustered families. Figure <figr fid="F4">4b</figr> shows the distribution of these three kinds of families in all the genomes. Most of the classified polytopic membrane domains belong to Pfam-A and Pfam-B, which cover 95% of classified domains.</p>
            <p>Classified polytopic membrane domains and their families were studied in relation to the number of ORFs in each genome. Figure <figr fid="F5">5a</figr> shows the number of classified polytopic membrane domains versus the number of ORFs in all the genomes, and Figure <figr fid="F5">5b</figr> shows the same relation in genomes of single-celled organisms. A rough linear relation seems to exist between the number of classified polytopic membrane domains and the number of ORFs in each genome. However, it is interesting that <it>C. elegans</it> is an obvious outlier in the trend. To try to explain this, we took a closer look at the biggest families of polytopic membrane domains in <it>C. elegans</it> (Figure <figr fid="F5">5c</figr>). The three biggest families in <it>C. elegans</it> are PF01604, PF01461, and PB000009, which are described as 7-TM chemoreceptor families. (The annotation of PB000009 is from PD000148 in Prodom [<abbr bid="B21">21</abbr>].) These families are almost unique to <it>C. elegans,</it> as most of their members in Pfam are from <it>C. elegans.</it> These families contain well-amplified membrane domains, with total numbers of 289, 250, and 216, respectively. Those numbers are more than double the biggest family in <it>Drosophila melanogaster,</it> which is PF00083 (Sugar (and other) transporter) with 108 members. By removing the number of proteins in these three families (a total of 754), we can see a better fit of <it>C. elegans</it> to the trend line. So the unusually large number of polytopic membrane domains is likely to be caused by protein amplification in a few families.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Classified polytopic membrane domains in relation to the number of ORFs in the 26 genomes studied</p>
               </caption>
               <text>
                  <p>Classified polytopic membrane domains in relation to the number of ORFs in the 26 genomes studied. <b>(a, b)</b> Plots of the number of classified polytopic membrane domains versus the number of ORFs in (a) all the studied genomes and (b) in genomes of single-celled organisms. The trend lines, though generated on the basis of data in each plot, have almost the same slope. CE<sup>*</sup> in red indicates the number of classified polytopic membrane domains in <it>C. elegans</it> after the three big 7-TM chemoreceptor families are removed (see (c)). <b>(c)</b> The top ten families of polytopic membrane domains, as judged by their occurrence in <it>C. elegans.</it><b>(d)</b> Plot of the number of classified families of polytopic membrane domains versus the logarithm of the number of ORFs in each genome.</p>
               </text>
               <graphic file="gb-2002-3-10-research0054-5"/>
            </fig>
            <p>This hypothesis was supported by analysis of Figure <figr fid="F5">5d</figr>, which shows the number of families of polytopic membrane domains in relation to the number of ORFs in studied genomes. The number of families seems to have a logarithmic relation in all studied genomes, including <it>C. elegans.</it> Given that <it>C. elegans</it> has an unusually large number of polytopic membrane domains but a normal number of families, the amplification of polytopic membrane domains is limited to a few families.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Polytopic membrane domains of integral membrane proteins in 26 genomes have been classified into 637 families, which include 218 Pfam-A, 298 Pfam-B and 121 clustered families. Only families that are reasonably big (&#8805; 4 members) were selected. The classified families were used for amino-acid distribution and pattern studies for genome-wide analysis.</p>
         <p>Our studies on amino-acid distribution and patterns were conducted on Pfam-A families. We also analyzed Pfam-B and the clustered families, but found fewer conservations, probably because the Pfam-B and the clustered families are not as carefully aligned as Pfam-A families. In the analysis of amino-acid positions, glycine, proline and tyrosine were found to be the most conserved residues in TM-helical regions, whereas isoleucine, valine, methionine and threonine were identified as the least conserved residues, relative to average occurrence. This result is mostly consistent with previous results from an MDM [<abbr bid="B10">10</abbr>]. Although hydrophobic residues such as leucine and isoleucine are among the most abundant residues in TM-helices, they are not well conserved in position. The observed conservation in position for residues such as glycine, proline and tyrosine raises the question of whether these residues are associated with the functions of integral membrane proteins.</p>
         <p>We also studied amino-acid pair motifs in the conserved sequences in classified families. We show that pairs consisting of a glycine and another small amino acid (glycine, alanine or serine) and facing the same direction in TM &#945;-helices are common in conserved positions. As those pair motifs have been shown to be important for packing of TM-helices [<abbr bid="B12">12</abbr>,<abbr bid="B13">13</abbr>,<abbr bid="B14">14</abbr>], conservation of those motifs probably implies their importance in folding stability of integral membrane proteins, as is the case with hydrophobic residues found in the core regions of soluble proteins.</p>
         <p>Our results have some interesting implications for the classified Pfam-A families annotated as transporters, symporters and channels. First, there is a preference for 12 TM-helices among these families. As there is no 12-TM transporter protein structure available, we do not know exactly why a 12 TM-helix bundle is preferred for transport. The structure of MsbA from <it>Escherichia coli</it> [<abbr bid="B22">22</abbr>], an ATP-binding cassette (ABC) transporter homolog, was recently solved. It contains 12 TM-helices in a homodimer of two 6-TM-helical bundles, which form a central chamber to translocate substrates. However, it is unlikely that polytopic membrane domains in the 12-TM Pfam-A families have a structure like that of ABC transporters; as there is no obvious sequence similarity within the sequence containing the 12 TM-helices, it is unlikely to form two 6-TM-helical bundles. By looking at structures of other transport proteins, including the potassium channel [<abbr bid="B23">23</abbr>], the mechanosensitive ion channel [<abbr bid="B24">24</abbr>], the aquaporin water channel [<abbr bid="B25">25</abbr>], and the glycerol facilitator channel [<abbr bid="B26">26</abbr>], it is apparent that 7-10 TM-helices are needed to form a tunnel and transport molecules. This means that proteins with a small number of TM-helices must oligomerize to form a proper tunnel to translocate molecules through the membrane. In addition, families of these proteins tend to have GxxxG and GxxxxxxG instead of related motifs that have one or both glycines changed to alanine or serine. While this preference is interesting, we do not know its origin. Perhaps it reflects especially tight packing among helices in transporters, permitting the C&#945;-H...O hydrogen bonding that has been discussed [<abbr bid="B14">14</abbr>].</p>
         <p>We also studied the distribution of classified families in 26 genomes. Although the classified families of polytopic membrane domains do not provide complete coverage of the total potential polytopic membrane domains, we think they include most membrane proteins that have essential functions in these genomes. The excluded domains are either unique in function for the organism or falsely predicted. In most genomes the number of classified polytopic membrane domains seems to have a linear relation with the number of ORFs. However, <it>C. elegans</it> is an outlier to this trend. By studying the families in <it>C. elegans,</it> we found that it has an exceptional number of 7-TM-helical membrane domains, most of which are annotated as chemoreceptors. As <it>C. elegans</it> cannot see or hear but must search for food, chemosensation is key to survival. <it>C. elegans</it> mediates chemosensation by 32 neurons that are mostly arranged in bilateral pairs on the left and right sides, and it is estimated that there are about 500 G-protein-coupled receptors that act in chemosensation [<abbr bid="B27">27</abbr>]. We have now identified many chemoreceptors (750), classified into three large families. Therefore, classification of polytopic membrane domains into families gives us another way to look at the distribution and functions of integral membrane proteins in genomes.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Databases</p>
            </st>
            <p>In this study, the following databases were used: SWISS-PROT (release 39 and updated to 19 December, 2000) [<abbr bid="B16">16</abbr>], which contains 91,132 protein entries; Pfam (release 6.1) [<abbr bid="B15">15</abbr>], which contains 2,727 protein families in Pfam-A and 40,230 families in Pfam-B; Proteome Analysis Database [<abbr bid="B28">28</abbr>], where complete non-redundant proteomes were downloaded. We selected eight genomes from archaea: <it>Archaeoglobus fulgidus</it> (AF), <it>Aeropyrum pernix K1</it> (AP), <it>Halobacterium</it> sp. (HS), <it>Methanococcus jannaschii</it> (MJ), <it>Methanobacterium thermoautotrophicum</it> (MT), <it>Pyrococcus abyssi</it> (PA), <it>Pyrococcus horikoshii</it> (PH), and <it>Thermoplasma acidophilum</it> (TA); 14 genomes from bacteria: <it>Aquifex aeolicus</it> (AA), <it>Borrelia burgdorferi</it> (BB), <it>Bacillus subtilis</it> (BS), <it>Chlamydia pneumoniae</it> strain AR39 (CP), <it>Chlamydia trachomatis</it> (CT), <it>E. coli</it> strain K12 (EC), <it>Haemophilus influenzae</it> (HI), <it>Helicobacter pylori</it> strain 26695 (HP), <it>Mycobacterium tuberculosis</it> (MyTu), <it>Mycoplasma genitalium</it> (MG), <it>Mycoplasma pneumoniae</it> (MP), <it>Rickettsia prowazekii</it> (RP), <it>Synechocystis</it> sp. (SS), and <it>Treponema pallidum</it> (TP); four genomes from eukaryotes: <it>Saccharomyces cerevisiae</it> (SC), <it>D. melanogaster</it> (DM), <it>C. elegans</it> (CE), and <it>Arabidopsis thaliana</it> (AT).</p>
         </sec>
         <sec>
            <st>
               <p>Classification of polytopic membrane protein domains</p>
            </st>
            <p>Figure <figr fid="F1">1a</figr> shows our complete classification procedure. We extracted 8,301 protein entries in the SWISS-PROT database containing no less than two TRANSMEM annotations in the FT field. In these proteins, a total of 52,636 transmembrane (TM) regions were allocated to proteins in the Pfam database. By analyzing the location of TM regions in protein domains of each Pfam family, we were able to identify families that contain polytopic membrane protein domains. We went through a relatively conservative procedure to identify potential families of polytopic membrane domains. First, a Pfam family needed to have a significant number of proteins containing no fewer than two TM regions to be identified as a polytopic membrane domain family. Second, all families in Pfam-A and some in Pfam-B that have more than seven members are analyzed, as the Pfam-B database is under development and contains thousands of small protein families. Finally, we identified 183 Pfam-A and 152 Pfam-B families. Proteins in these families contain 36,878 TM regions, representing approximately 70% of the total TM regions extracted from SWISS-PROT. We analyzed sizes of the loops between all the TM regions, as shown in the inner chart of Figure <figr fid="F1">1</figr>. By Pfam's protein domain classification, most loops (> 95%) are short peptides, containing less than 80 amino acids.</p>
            <p>Proteins from 26 genomes were submitted to TMHMM server for TM-helix prediction [<abbr bid="B6">6</abbr>]. Predicted membrane proteins were searched for polytopic membrane domains, using a rule, generated from the above result, that the intramembrane-domain loop sizes must be less than 80 amino acids. To identify domains that are included in the Pfam families that have been identified, we searched the defined polytopic membrane domains for SWISS-PROT ID matches and regional matches. Unmatched domains are further classified on the basis of Pfam's classification, and additional 48 Pfam-A and 166 Pfam-B families are identified (small size Pfam-B families with no less than four members and no less than three matches are selected). In total, we identified 231 Pfam-A and 318 Pfam-B families as polytopic membrane domains. As not all proteins from the 26 genomes are included in Pfam, we then tried to assign the unclassified polytopic membrane domains to the identified Pfam families by sequence similarity matching to proteins in these families. We used the FASTA program [<abbr bid="B18">18</abbr>] to search for matches, and matches with <it>E</it>-values less than 0.01 were considered positive. Obviously, one can assign Pfam-A domains using the HMMer software [<abbr bid="B29">29</abbr>], which they are closely associated with. However, we chose to take a somewhat simpler tack, using FASTA. This is a somewhat more conservative approach (finding fewer homologs) which has the advantage of using consistent thresholds that can be applied to all the searches. Query domains were assigned to Pfam families that their best matches belong to.</p>
            <p>As for those that have not been classified into Pfam families by either ID match or by sequence-similarity match, we tried to cluster these into families on the basis of their sequence similarities. This procedure was done by an all-against-all sequence similarity search (<it>E</it>-value &lt; 0.01) using FASTA, and polytopic membrane domains were clustered by applying a multiple linkage clustering method [<abbr bid="B30">30</abbr>] to the FASTA results. <it>N</it> family members must have more than 0.9<it>N</it> (<it>N</it>-1) links to other members, with tolerance of 10% missing links among members. We selected 121 clustered families that contain no fewer than four members, and aligned protein sequences in each family using the CLUSTAL W program [<abbr bid="B31">31</abbr>]. For a complete list of assigned polytopic membrane domains see Additional data files and [<abbr bid="B32">32</abbr>].</p>
         </sec>
         <sec>
            <st>
               <p>TM-helix identification in the families of polytopic membrane domains</p>
            </st>
            <p>We assume that all protein domains in a classified family have a defined number of TM-helices. To identify the number of TM-helices, we made a hydrophobic plot for each family of polytopic membrane domain. We took the aligned sequences in Pfam's families and in clustered families, and calculated the averaged GES hydrophobic values [<abbr bid="B8">8</abbr>] of all the residues at each aligned position (Deleted and inserted residues, represented by '-' and '.' respectively, are given 0 individual values.) The plot for each family was generated by the averaged GES values along their corresponding aligned positions. Most hydrophobic regions were clearly defined, as most TM-helices aligned well in each family. By identifying hydrophobic regions in the plots, we assigned numbers of TM-helices to classified families of polytopic membrane proteins. We also eliminated 3 Pfam-A and 20 Pfam-B families, as they did not contain multiple hydrophobic regions in their hydrophobicity plots. Therefore, we have 228 Pfam-A, 298 Pfam-B and 121 clustered families for further analysis.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of amino-acid distribution and pair motifs</p>
            </st>
            <p>We analyzed 168 Pfam-A families with more than 20 members and generated consensus sequences with their sequence logos of all aligned sequences in these families using the Alpro sequence logo program [<abbr bid="B19">19</abbr>]. The selected family size threshold of 20 members is somewhat arbitrary. We chose it because: first, a significant portion (~75%) of the 228 classified Pfam-A families had more than 20 members; and second, the potential bias from small families could be reduced as they tend to have more conserved residues than big families. However, we can show that our results remain unaffected by changing this threshold. In particular, we analyzed Pfam-A families containing more than 25, 30, 35, or 40 members, and got essentially the same results. Amino acids with sequence conservation values (R<sub>sequence</sub>) of no less than 3.0 (top 15% of all values) were considered as conserved residues. For all the families, we counted the occurrences of amino acids in the consensus sequences and in all aligned sequences in hydrophobic regions, which are defined to have no fewer than 10 continuous amino acids with GES hydrophobicity value greater than 0.</p>
            <p>We used the pair definition from a previous study [<abbr bid="B12">12</abbr>]. For example, a pair XY<it>n</it> (X and Y represent amino acids and <it>n</it> a number) corresponds to amino acids X and Y separated by (<it>n</it>-1) residues. We analyzed occurrences of pair motifs of all combinations of amino acids separated by 1 to 10 residues. This result was compared with a previous study of the 200 most significant over-represented pairs [<abbr bid="B12">12</abbr>,<abbr bid="B33">33</abbr>].</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the families of polytopic membrane domain in genomes</p>
            </st>
            <p>Using simple cross-referencing based on the above procedure, proteomic entries in each genome were searched for matches of polytopic membrane domains of classified families. Numbers of membrane domains in classified families were counted and analyzed in all genomes studied.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>A <supplr sid="s1">complete list</supplr> of assigned polytopic membrane domains is available as additional data and from [<abbr bid="B32">32</abbr>].</p>
         <suppl id="s1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>A complete list of assigned polytopic membrane domains</p>
            </caption>
            <text>
               <p>A complete list of assigned polytopic membrane domains</p>
            </text>
            <file name="gb-2002-3-10-research0054-s1.txt">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>M.G. thanks the Keck foundation for financial support. Y.L. is supported by an NLM postdoctoral fellowship. This research was supported in part by NIH grant T15 LM07056 from the National Library of Medicine. We thank Alessandro Senes and Steven Aller for helpful discussions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Microbial genome analyses: global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities.</p>
            </title>
            <aug>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Sliwinski</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Saier</snm>
                  <fnm>MHJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>277</volume>
            <fpage>573</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.1609</pubid>
                  <pubid idtype="pmpid" link="fulltext">9533881</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Microbial genome analyses: comparative transport capabilities in eighteen prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sliwinski</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Rabus</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Saier</snm>
                  <fnm>MHJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>301</volume>
            <fpage>75</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.3961</pubid>
                  <pubid idtype="pmpid" link="fulltext">10926494</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>274</volume>
            <fpage>562</fpage>
            <lpage>576</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1412</pubid>
                  <pubid idtype="pmpid" link="fulltext">9417935</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census.</p>
            </title>
            <aug>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1998</pubdate>
            <volume>33</volume>
            <fpage>518</fpage>
            <lpage>534</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0134(19981201)33:4&lt;518::AID-PROT5>3.0.CO;2-J</pubid>
                  <pubid idtype="pmpid" link="fulltext">9849936</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Wallin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1998</pubdate>
            <volume>7</volume>
            <fpage>1029</fpage>
            <lpage>1038</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9568909</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>305</volume>
            <fpage>567</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4315</pubid>
                  <pubid idtype="pmpid" link="fulltext">11152613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A simple method for displaying the hydropathic character of a protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Kyte</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>RF</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1982</pubdate>
            <volume>157</volume>
            <fpage>105</fpage>
            <lpage>132</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7108955</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Engelman</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Steitz</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Annu Rev Biophys Biophys Chem</source>
            <pubdate>1986</pubdate>
            <volume>15</volume>
            <fpage>321</fpage>
            <lpage>353</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bb.15.060186.001541</pubid>
                  <pubid idtype="pmpid">3521657</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule.</p>
            </title>
            <aug>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1992</pubdate>
            <volume>225</volume>
            <fpage>487</fpage>
            <lpage>494</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1593632</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A mutation data matrix for transmembrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1994</pubdate>
            <volume>339</volume>
            <fpage>269</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0014-5793(94)80429-X</pubid>
                  <pubid idtype="pmpid">8112466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Statistical analysis of predicted transmembrane alpha-helices.</p>
            </title>
            <aug>
               <au>
                  <snm>Arkin</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Brunger</snm>
                  <fnm>AT</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>1998</pubdate>
            <volume>1429</volume>
            <fpage>113</fpage>
            <lpage>128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0167-4838(98)00225-8</pubid>
                  <pubid idtype="pmpid">9920390</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions.</p>
            </title>
            <aug>
               <au>
                  <snm>Senes</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Engelman</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>296</volume>
            <fpage>921</fpage>
            <lpage>936</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.3488</pubid>
                  <pubid idtype="pmpid" link="fulltext">10677292</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The GxxxG motif: a framework for transmembrane helix-helix association.</p>
            </title>
            <aug>
               <au>
                  <snm>Russ</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Engelman</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>296</volume>
            <fpage>911</fpage>
            <lpage>919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.3489</pubid>
                  <pubid idtype="pmpid" link="fulltext">10677291</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The Calpha-H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>Senes</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ubarretxena-Belandia</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Engelman</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>9056</fpage>
            <lpage>9061</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55372</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481472</pubid>
                  <pubid idtype="doi">10.1073/pnas.161280798</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The Pfam protein families database.</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>263</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102420</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592242</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.263</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>45</fpage>
            <lpage>48</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102476</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592178</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.45</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>How representative are the known structures of the proteins in a complete genome? A comprehensive structural census.</p>
            </title>
            <aug>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Fold Des</source>
            <pubdate>1998</pubdate>
            <volume>3</volume>
            <fpage>497</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9889159</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Improved tools for biological sequence comparison.</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>48</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid">3162770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Sequence logos: a new way to display consensus sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Stephens</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <fpage>6097</fpage>
            <lpage>6100</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2172928</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <aug>
               <au>
                  <snm>Branden</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tooze</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Introduction to Protein Structure.</source>
            <publisher>London: Garland Publishing;</publisher>
            <pubdate>1991</pubdate>
         </bibl>
         <bibl id="B21">
            <title>
               <p>ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons.</p>
            </title>
            <aug>
               <au>
                  <snm>Corpet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Servant</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>267</fpage>
            <lpage>269</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102458</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592243</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.267</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Structure of MsbA from <it>E. coli</it>: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters.</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <fpage>1793</fpage>
            <lpage>1800</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.293.5536.1793</pubid>
                  <pubid idtype="pmpid" link="fulltext">11546864</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The structure of the potassium channel: molecular basis of K<sup>+</sup> conduction and selectivity.</p>
            </title>
            <aug>
               <au>
                  <snm>Doyle</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Morais Cabral</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pfuetzner</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kuo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gulbis</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Chait</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>MacKinnon</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>280</volume>
            <fpage>69</fpage>
            <lpage>77</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.280.5360.69</pubid>
                  <pubid idtype="pmpid" link="fulltext">9525859</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Structure of the MscL homolog from <it>Mycobacterium tuberculosis</it>: a gated mechanosensitive ion channel.</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Spencer</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Barclay</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Rees</snm>
                  <fnm>DC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2220</fpage>
            <lpage>2226</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5397.2220</pubid>
                  <pubid idtype="pmpid" link="fulltext">9856938</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Structural determinants of water permeation through aquaporin-1.</p>
            </title>
            <aug>
               <au>
                  <snm>Murata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mitsuoka</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hirai</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Walz</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Agre</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Heymann</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Engel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fujiyoshi</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>407</volume>
            <fpage>599</fpage>
            <lpage>605</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35036519</pubid>
                  <pubid idtype="pmpid" link="fulltext">11034202</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Structure of a glycerol-conducting channel and the basis for its selectivity.</p>
            </title>
            <aug>
               <au>
                  <snm>Fu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Libson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Miercke</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Weitzman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nollert</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krucinski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stroud</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>290</volume>
            <fpage>481</fpage>
            <lpage>486</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.290.5491.481</pubid>
                  <pubid idtype="pmpid" link="fulltext">11039922</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Neurobiology of the <it>Caenorhabditis elegans</it> genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Bargmann</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2028</fpage>
            <lpage>2033</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5396.2028</pubid>
                  <pubid idtype="pmpid" link="fulltext">9851919</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Biswas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Karavidopoulou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kersey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kriventseva</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Mittard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mulder</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>44</fpage>
            <lpage>48</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29822</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125045</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.44</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Profile hidden Markov models.</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <aug>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rousseeuw</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Finding Groups in Data: An Introduction to Cluster Analysis.</source>
            <publisher>New York: John Wiley and Sons</publisher>
            <pubdate>1990</pubdate>
         </bibl>
         <bibl id="B31">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7984417</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Index of genome/tms</p>
            </title>
            <url>http://bioinfo.mbb.yale.edu/genome/tms</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>TMSTAT: statistical analysis of transmembrane sequences</p>
            </title>
            <url>http://engelman.csb.yale.edu/tmstat/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
