<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-11-r236</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Inferring genome-scale rearrangement phylogeny and ancestral gene order: a <it>Drosophila </it>case study</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Bhutkar</snm>
               <fnm>Arjun</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>arjunb@morgan.harvard.edu</email>
            </au>
            <au id="A2">
               <snm>Gelbart</snm>
               <mi>M</mi>
               <fnm>William</fnm>
               <insr iid="I2"/>
               <email>gelbart@morgan.harvard.edu</email>
            </au>
            <au id="A3">
               <snm>Smith</snm>
               <mi>F</mi>
               <fnm>Temple</fnm>
               <insr iid="I1"/>
               <email>tsmith@darwin.bu.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>BioMolecular Engineering Research Center, Boston University, Cummington St, Boston, MA 02215, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 021383, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>11</issue>
         <fpage>R236</fpage>
         <url>http://genomebiology.com/2007/8/11/R236</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17996033</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-11-r236</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>6</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>17</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </revrec>
         <pub>
            <date>
               <day>08</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Bhutkar et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Rearrangement phylogeny and ancestral gene order</p>
      </shorttitle>
      <shortabs>
         <p>A simple, fast, and biologically-inspired computational approach to infer genome-scale rearrangement phylogeny and ancestral gene order has been developed and applied to eight Drosophila genomes, providing insights into evolutionary chromosomal dynamics.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>A simple, fast, and biologically inspired computational approach for inferring genome-scale rearrangement phylogeny and ancestral gene order has been developed. This has been applied to eight <it>Drosophila </it>genomes. Existing techniques are either limited to a few hundred markers or a small number of taxa. This analysis uses over 14,000 genomic loci and employs discrete elements consisting of pairs of homologous genetic elements. The results provide insight into evolutionary chromosomal dynamics and synteny analysis, and inform speciation studies.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010015">Model organisms</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Chromosomal rearrangements have been studied in <it>Drosophila </it>since the early 20th century, originally via optical observation of banding patterns <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Chromosomal inversions have been inferred from such observations as well as from other genomic marker pairs <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. These inversions and clusters of banding patterns have also been used to study evolutionary history <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, adaptation, and speciation <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. More recently, the identification and analysis of gene synteny (conserved blocks of ordered genes) has been used to infer evolutionary rearrangements and relationships among organisms from bacteria <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> to <it>Drosophila </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and mammals <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The primary motivation for this work is to provide a fast computational method to derive phylogenetic relationships, and to estimate rearrangement counts and ancestral gene order for large datasets, while overcoming the limitations of current gene order based methods described below. These methods either do not converge on a solution for large datasets or are limited by execution speed and input data size to a few hundred markers or a small number of taxa.</p>
         <p>There have been a number of modern approaches to full-genome comparative analysis and gene order analysis <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Parsimonious methods based on gene order analysis usually begin with a search for homologous genes and the identification of syntenic gene clusters. They have generally been limited by the need to compensate, insofar as possible, for homolog uncertainty in the presence of paralogs, and for missing data in assembly gaps. Such approaches usually build a graphical representation to map the synteny linkage between pairs of chromosomes. These graphical representations can be processed computationally via various algorithmic approaches <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp> to find the minimum number and specific types of genetic events that would result in the observed mapping, thus providing an estimate of the distance between genomes. Methods focusing on gene order and content data have been investigated in detail <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp> with a focus on the computational issues involved therein. The general computational problem of reconstructing a phylogeny from gene order data is NP-hard <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp> and various heuristics have been employed <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
         <p>Studying genome rearrangements is an important tool that aids in the understanding of evolutionary events. Previous approaches using pairs of chromosome bands <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, multidirectional chromosome painting <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and pairs of adjacent genes to study rates of genome shuffling <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> have shown how rearrangements affect genome organization during evolution. This provides some of the motivation for the method presented here.</p>
         <p>Comparative analysis of insect genomes is expected to yield significant insights into evolution, development, and regulation <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. With the availability of a large number of fully sequenced genomes, particularly from closely related species, there is now a need to revisit such methodologies with the aim of reconstructing detailed genome-wide evolutionary histories. The recently sequenced genomes of a large number of fruit fly (<it>Drosophila</it>) species (Drosophila 12 Genomes Consortium, 2007) and other insects provide an ideal data set for this purpose. The currently assumed phylogenetic relationships between various fly species <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp> involve species thought to have diverged from 5 million to about 50 million years ago. Research on <it>D</it>. <it>melanogaster </it>(<it>Dmel</it>) has provided a wealth of tools and resources <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> over the years, including the well annotated <it>D</it>. <it>melanogaster </it>genome sequence <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
         <p>Chromosomal translocations are rare in <it>Drosophila </it>species <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Most genes are restricted to the same arm or Muller element <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> with reshuffling along the arm due to paracentric inversions. This potentially simplifies the analysis of rearrangements. While gene translocation via retrotransposition <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp> does occur (Bhutkar A, Russo S, Smith TF, Gelbart WM, Genome Scale Analysis of Positionally Relocated Genes, <it>Genome Research (in press)</it>), it appears to be rare <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Over the course of the 20th century, <it>Drosophila </it>phylogeny was estimated using a number of high-level methods, such as morphological analysis, geographical distribution, limited genetic analysis, and from sequence variation of a small set of genes. The techniques and results presented in this study support the recently updated phylogenetic grouping of <it>Drosophila yakuba </it>(<it>Dyak</it>) and <it>Drosophila erecta </it>(<it>Dere</it>), provide a validation of the assumed <it>Drosophila </it>phylogeny for the remaining species, and estimate the number of fixed chromosomal rearrangement breaks based on genome-scale analysis involving over 14,000 (over 32,000 including outgroup species) precise molecular markers. While accommodating gene translocation between arms, and paracentric and pericentric inversions, this approach uses neighboring gene pairs (NGPs) across multiple closely related species to infer evolutionary relationships, a rearrangement phylogeny, and ancestral syntenic arrangements. The fundamental biologically inspired idea is that inversions are rare events, pairs of adjacent genetic loci observed in multiple species probably existed in their common ancestor, and each inversion disrupts two pairs of neighboring genetic elements and creates two new pairs. Essentially, the likelihood of two independent inversions in disjoint lineages creating the same pair of adjacent genetic loci is low. This approach is a significant advance over existing techniques in its speed, its ability to handle large datasets that were previously unmanageable, and in its ability to process preliminary genome assembly data - as outlined in the Discussion section. The results place <it>Drosophila </it>inter-species rearrangement relationships on a solid footing. Furthermore, chromosomal inversions have been mapped to specific branches of the tree for all species, and previously unknown <it>Drosophila </it>ancestral gene arrangements have been inferred. This also quantifies and highlights particular lineages and species that have undergone a high level of chromosomal rearrangements, thus supplying critical information for speciation studies.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Utilizing 8,967 high-confidence genes common to all <it>Drosophila </it>species (Additional data file 1) resulted in 14,947 arm-indexed NGPs (Additional data file 2) across all <it>Drosophila </it>species, excluding outgroup species. Clustering these arm-indexed NGPs to maximize 'exclusively shared NGPs' (see Materials and methods) resulted in species partitioning for initial phylogenetic analysis (Figure <figr fid="F1">1a</figr> and Additional data file 4). See Materials and methods for details on this similarity maximization metric and the motivation behind it. These phylogenetic relationships validate the currently accepted placement of <it>D. yakuba </it>on the evolutionary tree <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>, which is also supported by a shared meta-centric inversion with <it>D. erecta </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Partitioning of various <it>Drosophila </it>species and outgroup species (<it>Anopheles gambiae </it>(<it>Agam</it>), <it>Aedes aegypti </it>(<it>Aaeg</it>), <it>Apis mellifera </it>(<it>Amel</it>), and <it>Tribolium castaneum </it>(<it>Tcas</it>)) based on 'exclusively shared NGPs' (NGPs found in each species in a clustered group and not found in any species outside this group - see Materials and methods)</p>
            </caption>
            <text>
               <p>Partitioning of various <it>Drosophila </it>species and outgroup species (<it>Anopheles gambiae </it>(<it>Agam</it>), <it>Aedes aegypti </it>(<it>Aaeg</it>), <it>Apis mellifera </it>(<it>Amel</it>), and <it>Tribolium castaneum </it>(<it>Tcas</it>)) based on 'exclusively shared NGPs' (NGPs found in each species in a clustered group and not found in any species outside this group - see Materials and methods). A box around a pair of species, a cluster and a species, or two clusters, signifies that they are inferred to be grouped together in the phylogeny. Numbers denote the actual number of 'exclusively shared NGPs' unique to each cluster. <b>(a) </b>Arm-indexed clustering within genus <it>Drosophila</it>. Genes with orthologs in all genus <it>Drosophila </it>species (see Materials and methods for species' names) are chosen to form NGPs. This clustering reveals subgenus <it>Drosophila</it>, subgenus <it>Sophophora </it>and <it>melanogaster </it>subgroup species to be distinct clusters. This binary partitioning validates the placement of <it>Dyak </it>(see text) and agrees with the currently understood phylogenetic relationships between other <it>Drosophila </it>species (see Discussion for details). <b>(b) </b>Relaxed clustering without arm indexing for NGPs, in order to include outgroup species that differ in chromosomal architecture (see Materials and methods). The set of common genes between all species, including outgroup species, is used to derive NGPs. Relaxing arm indexing results in loss of signal within the closely related <it>melanogaster </it>subgroup species (<it>Dmel</it>, <it>Dyak</it>, <it>Dere</it>) where <it>Dmel </it>+ <it>Dere</it>, <it>Dyak </it>+ <it>Dere</it>, and <it>Dmel </it>+ <it>Dyak </it>are weak clusters with 16, 15, and 9 exclusively shared NGPs, respectively. See Discussion and Materials and methods for details.</p>
            </text>
            <graphic file="gb-2007-8-11-r236-1"/>
         </fig>
         <p>To test this method with distant outgroup species, a set of high-confidence common genes across <it>Drosophila </it>species and four outgroup species was chosen while relaxing the arm-indexing requirement for NGPs in order to allow for varying chromosomal architecture of outgroup species. This resulted in a set of 4,085 genes and 19,416 NGPs, which were clustered using the same similarity maximization metric (Figure <figr fid="F1">1b</figr> and Additional data file 5). A loss of signal for closely related species (<it>Dmel</it>, <it>Dyak</it>, <it>Dere</it>) is noticeable due to the lack of arm-indexing. See Discussion for details. For validation, a maximum likelihood gene tree was generated using a set of universal eukaryotic genes (<it>SRP54 </it>and <it>SRP19</it>) thought to be under minimal species-specific selection. The resulting gene tree (Figure <figr fid="F2">2</figr>) has an identical topology to the partitioning (Figure <figr fid="F1">1a</figr>).</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Maximum likelihood gene tree generated with PHYLIP version 3.65 using amino acid sequences for proteins SRP54 and SRP19 from various genus <it>Drosophila </it>species and <it>Anopheles gambiae </it>(Agam) as the outgroup species</p>
            </caption>
            <text>
               <p>Maximum likelihood gene tree generated with PHYLIP version 3.65 using amino acid sequences for proteins SRP54 and SRP19 from various genus <it>Drosophila </it>species and <it>Anopheles gambiae </it>(Agam) as the outgroup species. Data for the tree is also provided in Additional data file 9. The tree has been artificially rooted with outgroup species (Agam). Numbers reflect the relative arm lengths from this root. Species within subgenus <it>Drosophila </it>(<it>Dvir</it>, <it>Dmoj</it>, <it>Dgri</it>) show lower overall average branch length than species within subgenus <it>Sophophora</it>, similar to Figure 3.</p>
            </text>
            <graphic file="gb-2007-8-11-r236-2"/>
         </fig>
         <p>To infer <it>Drosophila </it>ancestral adjacencies, the set of common genes across <it>Drosophila </it>species was chosen (8,967 genes), the arm indexing criterion was relaxed to allow for varying chromosome architecture, and four outgroup species were added to form the set of NGPs. This resulted in a total of 32,154 NGPs (Additional data file 3) out of which 14,162 NGPs were contributed by one or more <it>Drosophila </it>species. The count of <it>Drosophila </it>NGPs is down from 14,947 arm-indexed NGPs to 14,162 as a result of relaxing the arm-indexing requirement.</p>
         <p>Starting with the NGP phylogeny inferred earlier, and performing an iterative walk down and up this implied phylogeny (Figure <figr fid="F1">1a</figr>), estimates for the number of fixed rearrangement breaks along each branch of the tree are calculated (Figure <figr fid="F3">3</figr>) as outlined in the Materials and methods section. For a given node, the rearrangement phylogeny estimates a lower bound for the number of disruptions of NGPs that existed at the immediate ancestor. Ambiguous cases are handled as discussed in Materials and methods with evidence from outgroup species, wherever applicable. An estimate of the inversion count can be computed from a rearrangement phylogeny as the number of inversion events that resulted in the observed rearrangements (each inversion disrupts two ancestral gene pairs and creates two new pairs).</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Rearrangement phylogeny for genus <it>Drosophila</it></p>
            </caption>
            <text>
               <p>Rearrangement phylogeny for genus <it>Drosophila</it>. The number along each branch of the tree shows the probable number of fixed rearrangement breaks inferred along that evolutionary branch. Each inferred rearrangement break corresponds to the disruption of a gene pair (NGP) that was inferred to exist in the immediate ancestor. Consequently, it includes macro and micro syntenic disruptions. See Materials and methods for details on the handling of ambiguous cases. Rearrangement breaks are assumed to occur as a result of chromosomal inversion events. Estimates for inversion counts can be computed from these data as outlined in the Materials and methods. The total number of inferred fixed rearrangement breaks for each genus <it>Drosophila </it>species, from the <it>Drosophila </it>root, is mentioned alongside the species name. <it>Anopheles gambiae </it>(shown), <it>Aedes aegypti</it>, <it>Apis mellifera</it>, and <it>Tribolium castaneum </it>are also used as outgroup species. Subgenus <it>Drosophila </it>species show lower overall average branch lengths than subgenus <it>Sophophora </it>species. Dashed lines at the subgenus <it>Sophophora </it>and subgenus <it>Drosophila </it>nodes reflect the loss of genus-specific NGP signal at the genus <it>Drosophila </it>root, which is only partially compensated for by distant outgroup species. See Discussion for details.</p>
            </text>
            <graphic file="gb-2007-8-11-r236-3"/>
         </fig>
         <p>Comparison with known rearrangements in the <it>eve </it>region of <it>Drosophila </it><abbrgrp><abbr bid="B42">42</abbr></abbrgrp> shows that the adjacency between genes CG2328 and CG2331 is captured in three species (<it>Dmel</it>, <it>Dere</it>, <it>Dyak</it>) and is absent in the other species, as expected. CG2328 is adjacent to CG30421 in the other species and this adjacency is inferred to be ancestral as evidence for it straddles the <it>Drosophila </it>root, pointing to a rearrangement in the branch leading to <it>Dmel</it>, <it>Dere</it>, and <it>Dyak</it>. Further, a comparison with analysis of rearrangements reported earlier in the <it>lab-pb </it>region <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> shows that the <it>lab-pb </it>neighborhood is captured correctly as an adjacency in <it>Dmel </it>and <it>Dpse</it>. It is also inferred to be an ancestral adjacency with evidence from subgenus <it>Sophophora </it>species, which is in line with earlier analysis <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
         <p>A comparison of the relative number of ancestral syntenic blocks and gene count in syntenic blocks under various assumptions used in this method is shown (Figure <figr fid="F4">4</figr>). The distribution of ancestral syntenic block sizes, in terms of gene count, at the root of the genus <it>Drosophila </it>tree computed by this method under various criteria is presented (Table <tblr tid="T1">1</tblr>, Additional data files 6 and 7). The largest ancestral syntenic block at the genus <it>Drosophila </it>root has 61 genes under the most relaxed assumptions (criterion 3). Of the 13,706 euchromatic genes annotated in FlyBase release 4.3 <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, filtering out genes based on lack of strong homologous placements in one or more species and other criteria (embedded genes, assembly gaps, and so on), a set of 8,967 common genes (Additional data file 1) was used in this analysis. This is a conservative set that can be expanded as better homology data become available across species. A little over 73% (62% for criterion 1; 63% for criterion 2) of these 8,967 <it>D. melanogaster </it>annotated genes were placed in ancestral syntenic blocks of size greater than five genes, and approximately 30% (14% for criterion 1; 15% for criterion 2) were placed in blocks of size 20 genes or more at the root of the genus <it>Drosophila </it>tree under the most relaxed assumptions (criterion 3). In the context of rearrangement activity within <it>Drosophila </it>species, of the 8,967 common genes, 3,691 (41%) genes were seen only in two NGPs and the rest were observed in three or more NGPs across all species.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Distribution of syntenic block sizes (&#8805;3 genes) at the root of the <it>Drosophila </it>tree under various relaxed criteria</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c cspan="3" ca="center">
                     <p>No. of blocks</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Syntenic block size (no. of genes)</p>
                  </c>
                  <c ca="right">
                     <p>Criterion 1</p>
                  </c>
                  <c ca="right">
                     <p>Criterion 2</p>
                  </c>
                  <c ca="right">
                     <p>Criterion 3</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>3</p>
                  </c>
                  <c ca="right">
                     <p>283</p>
                  </c>
                  <c ca="right">
                     <p>279</p>
                  </c>
                  <c ca="right">
                     <p>162</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>4</p>
                  </c>
                  <c ca="right">
                     <p>119</p>
                  </c>
                  <c ca="right">
                     <p>113</p>
                  </c>
                  <c ca="right">
                     <p>57</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>5</p>
                  </c>
                  <c ca="right">
                     <p>137</p>
                  </c>
                  <c ca="right">
                     <p>136</p>
                  </c>
                  <c ca="right">
                     <p>67</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>6</p>
                  </c>
                  <c ca="right">
                     <p>86</p>
                  </c>
                  <c ca="right">
                     <p>82</p>
                  </c>
                  <c ca="right">
                     <p>52</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>7</p>
                  </c>
                  <c ca="right">
                     <p>73</p>
                  </c>
                  <c ca="right">
                     <p>71</p>
                  </c>
                  <c ca="right">
                     <p>55</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>8</p>
                  </c>
                  <c ca="right">
                     <p>54</p>
                  </c>
                  <c ca="right">
                     <p>59</p>
                  </c>
                  <c ca="right">
                     <p>53</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>9</p>
                  </c>
                  <c ca="right">
                     <p>34</p>
                  </c>
                  <c ca="right">
                     <p>30</p>
                  </c>
                  <c ca="right">
                     <p>29</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>10</p>
                  </c>
                  <c ca="right">
                     <p>39</p>
                  </c>
                  <c ca="right">
                     <p>35</p>
                  </c>
                  <c ca="right">
                     <p>33</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>11</p>
                  </c>
                  <c ca="right">
                     <p>36</p>
                  </c>
                  <c ca="right">
                     <p>33</p>
                  </c>
                  <c ca="right">
                     <p>27</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>12</p>
                  </c>
                  <c ca="right">
                     <p>26</p>
                  </c>
                  <c ca="right">
                     <p>28</p>
                  </c>
                  <c ca="right">
                     <p>19</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>13</p>
                  </c>
                  <c ca="right">
                     <p>22</p>
                  </c>
                  <c ca="right">
                     <p>23</p>
                  </c>
                  <c ca="right">
                     <p>28</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>14</p>
                  </c>
                  <c ca="right">
                     <p>22</p>
                  </c>
                  <c ca="right">
                     <p>24</p>
                  </c>
                  <c ca="right">
                     <p>27</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>15</p>
                  </c>
                  <c ca="right">
                     <p>16</p>
                  </c>
                  <c ca="right">
                     <p>15</p>
                  </c>
                  <c ca="right">
                     <p>18</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>16</p>
                  </c>
                  <c ca="right">
                     <p>11</p>
                  </c>
                  <c ca="right">
                     <p>13</p>
                  </c>
                  <c ca="right">
                     <p>14</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>17</p>
                  </c>
                  <c ca="right">
                     <p>9</p>
                  </c>
                  <c ca="right">
                     <p>12</p>
                  </c>
                  <c ca="right">
                     <p>10</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>18</p>
                  </c>
                  <c ca="right">
                     <p>9</p>
                  </c>
                  <c ca="right">
                     <p>10</p>
                  </c>
                  <c ca="right">
                     <p>9</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>19</p>
                  </c>
                  <c ca="right">
                     <p>6</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
                  <c ca="right">
                     <p>7</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>20</p>
                  </c>
                  <c ca="right">
                     <p>7</p>
                  </c>
                  <c ca="right">
                     <p>7</p>
                  </c>
                  <c ca="right">
                     <p>9</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>21</p>
                  </c>
                  <c ca="right">
                     <p>6</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
                  <c ca="right">
                     <p>7</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>22</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
                  <c ca="right">
                     <p>7</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>23</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>3</p>
                  </c>
                  <c ca="right">
                     <p>6</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>24</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
                  <c ca="right">
                     <p>6</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>25</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>26</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>27</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>6</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>28</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>29</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
                  <c ca="right">
                     <p>3</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>30</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>4</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>31</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>32</p>
                  </c>
                  <c ca="right">
                     <p>3</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>33</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
                  <c ca="right">
                     <p>3</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>35</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>36</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>38</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>39</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>40</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>41</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>43</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>44</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>46</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>47</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>48</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>51</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>54</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>61</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>0</p>
                  </c>
                  <c ca="right">
                     <p>1</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Note that criterion 2 is weaker than criterion 1 and criterion 3 includes the weakest assumptions. Criterion 1: first-pass syntenic blocks. Criterion 2: result of bridging syntenic blocks with genes on block edges paired using outgroup species evidence. Criterion 3: further merging of syntenic blocks based on relaxed assumption of bridging blocks using genes on block edges paired in at least one fly species. See Additional data files 6 and 7 for gene composition of blocks.</p>
            </tblfn>
         </tbl>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Comparison between number of syntenic blocks and total number of genes in syntenic blocks of various sizes at the <it>Drosophila </it>root</p>
            </caption>
            <text>
               <p>Comparison between number of syntenic blocks and total number of genes in syntenic blocks of various sizes at the <it>Drosophila </it>root. Values are normalized between 0 and 1 with the maximum value set to 1. The x-axis shows various criteria based on the different relaxed assumptions discussed in the text. Criterion 1: first-pass syntenic blocks. Criterion 2: results of further merging based on outgroup evidence. Criterion 3: further merging of syntenic blocks based on relaxed assumption of bridging blocks using genes on block edges paired in at least one fly species. As additional evidence is incorporated using relaxed assumptions, blocks are merged into longer chains, which results in a lowering of the total number of syntenic blocks (1: 1,029 blocks, 2: 1,018 blocks, 3: 758 blocks). Correspondingly, the number of genes in larger blocks increases (for blocks >5 genes in size: 1: 5,532 genes, 2: 5,656 genes, 3: 6,576 genes; for blocks &#8805;20 genes in size: 1: 1,230 genes, 2: 1,329 genes, 2,638 genes).</p>
            </text>
            <graphic file="gb-2007-8-11-r236-4"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In contrast to existing approaches, this method provides a computationally fast technique that infers phylogenetic relationships between a given set of species and calculates rearrangement counts and probable ancestral syntenic blocks. The genus <it>Drosophila </it>phylogenetic relationships derived using arm-indexed NGPs (Figure <figr fid="F1">1a</figr>) match previously assumed relationships <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, and lend support to the clustering of <it>D. yakuba </it>with <it>D. erecta </it>as opposed to being clustered with <it>D. melanogaster</it>. This had been a source of debate in the <it>Drosophila </it>community <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>, with small-scale evidence supporting the alternative hypothesis until it was resolved recently <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. This clustering is also supported by the fact that both <it>D. yakuba </it>and <it>D. erecta </it>share a pericentric inversion between Muller elements B and C, indicating a shared evolutionary event distinct from <it>D. melanogaster </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Relaxing the arm-indexing criteria to include outgroup species (Figure <figr fid="F1">1b</figr>) expands the set of NGPs (over 32,000) but results in loss of signal between closely related species that share chromosomal architecture and might differ only slightly in their gene order through transposition events. Arm-indexing proves to be a valuable tool in the phylogenetic analysis of closely related species that might share most of their paracentric inversions (due to a common lineage) and differ only slightly in gene order as a result of a small number of arm transpositions or pericentric inversions.</p>
         <p>The total rearrangement counts from the root of the <it>Drosophila </it>tree to each fly species indicate that subgenus <it>Drosophila </it>(<it>D. virilis </it>(<it>Dvir</it>), <it>D. mojavensis </it>(<it>Dmoj</it>), <it>D. grimshawi </it>(<it>Dgri</it>)) species show lower overall average branch lengths than subgenus <it>Sophophora </it>species, which is similar to the relative branch lengths in the <it>SRP </it>gene tree (Figure <figr fid="F2">2</figr>). The rearrangement count for <it>Anopheles gambiae </it>would be higher if the distribution of shared genes across different arms is taken into account as separate events. Additional analysis of rearrangement rates <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> using the results of the NGP method is the subject of further study. In order to account for differing qualities of species assemblies, this method identifies all genes on assembly scaffold edges and on singleton scaffolds. As a result, breaks in gene pairs at assembly scaffold edges do not result in over-counting rearrangement events due to low level of assembly quality. Probable assembly errors can be identified via adjacent blocks that violate arm indexing with lack of supporting evidence from other species, barring species-specific cases. Furthermore, an indication of assembly gaps in a given species can be derived from the number of genes missing in that species, but present in two or more neighboring species, assuming a low number of single taxon gene loss events in closely related species.</p>
         <p>The distribution of syntenic block sizes at the root of the <it>Drosophila </it>tree (Figure <figr fid="F4">4</figr>, Table <tblr tid="T1">1</tblr>) illustrates the incorporation of sequentially relaxed assumptions in the computation of syntenic blocks. The first-pass syntenic blocks (criterion 1) are bridged and extended using outgroup evidence and subsequently using bridging pairs that occur in at least one species anywhere in the <it>Drosophila </it>tree. Each relaxation leads to joins of progressively lower confidence. In the case of criterion 3 (Table <tblr tid="T1">1</tblr>), there may exist conflicts between two possible joins. However, these relaxed criteria are in line with our earlier assumption about the low probability of identical NGPs being created independently in different species. The number of syntenic blocks starts out with 1,029 blocks in the initial analysis and then decreases (down to 1,018 blocks with outgroup evidence and to 758 blocks with evidence from any one <it>Drosophila </it>species) as blocks are merged into longer blocks by incorporating additional evidence (Figure <figr fid="F4">4</figr>). The total gene count across variously sized syntenic blocks also increases with the addition of further evidence. The distribution of block sizes (Table <tblr tid="T1">1</tblr>) shows how the chaining of syntenic blocks results in larger blocks with an increased gene count as the assumptions are relaxed.</p>
         <p>The identification of genes involved in multiple dissimilar NGPs at a rate above a threshold would give a probable set of genetic loci in the neighborhood of rearrangement hotspots. An analysis of the association between these probable hotspots and transposable elements in various species can be undertaken as such elements are characterized across different <it>Drosophila </it>species. The distribution of transposable elements on <it>Drosophila </it>chromosomes is known to be non-random <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp>. Transposable elements, repeats and breakpoint motifs have been implicated in generating chromosomal inversions in <it>Drosophila </it>by some studies <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>. Some studies indicate that rearrangement junctions might not be significantly enhanced for transposable elements <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and that these elements might be over-represented in chromosomal areas with lower recombination rates <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp>.</p>
         <p>Although the simple computational approach presented here uses homologous protein coding genes and corresponding NGPs, the method is applicable to a wide range of homologous genome markers. This method falls under the broad class of parsimonious gene-order approaches <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> with a few differences. It relies on the fundamental biologically inspired idea that inversions are rare events, pairs of adjacent genetic loci observed in multiple species probably existed in their common ancestor, and each inversion disrupts two pairs of neighboring genetic elements and creates two new pairs. The use of a higher order construct like arm-indexed NGPs for phylogenetic clustering and a two stage tree traversal procedure to infer ancestral gene synteny are other key features. The first stage of this approach, inferring phylogenetic relationships through maximizing gene pair similarity (as opposed to the traditional distance measure used by other techniques), is motivated by the assumption that if species share a NGP, it is the result of an inversion event along a shared lineage that resulted in the creation of that NGP that has not been disrupted by additional events (that is, ancestral gene pair conserved in extant species). Additionally, the likelihood of finding the same NGP in other species that do not share that lineage is rare. The clustering of certain species to the exclusion of others is based on the maximization of 'exclusively shared NGPs' (NGPs found in all species in a cluster and not found in any species outside this cluster - see Materials and methods). This allows for the method to extract a strong signal to cluster species into smaller groups although they might share other ancestral NGPs in common with species that are evolutionarily farther away. This is particularly evident in the arm-indexing of NGPs to form sub-clusters within a group of closely related species. The limits of this approach would be reached if single taxon inversion events dominate (and lineage-specific inversion events are rare), resulting in homoplasy in the inversion dataset. For a given set of species, if the level of inversion homoplasy in the dataset rivals the number of 'exclusively shared NGPs' that cluster sub-groups of species together, loss of the NGP signal would render this method ineffective. The second stage of this approach, inferring rearrangement counts, is motivated by the fact that ancestral NGPs can be inferred using the principle that NGPs seen in species across both sides of a node existed at that node with high probability and that NGP disruptions are the result of shared (given rarity of inversions) or single taxon inversion events that disrupt NGPs. The same principles are also used in the inference of ancestral syntenic blocks where evidence to chain syntenic blocks comes from the derived ancestral NGPs and outgroup conservation of NGPs assuming that those pairs existed at the common ancestor rather than being derived independently a result of identical inversions across multiple lineages.</p>
         <p>Using these simple strategies, this method has the advantage of simplicity, speed, missing data tolerance and the flexibility to exploit various levels of biological assumptions. In order to overcome some of the speed and data size limitations of existing approaches, we make a number of practical assumptions and use decision-making strategies as discussed in the Materials and methods section. The implementation avoids the need for more complex heuristics for NP-hard problems that are often employed <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B25">25</abbr></abbrgrp>, at least for relatively closely related species. It appears quite insensitive to assembly incompleteness and probable errors.</p>
         <p>Compared to simple parsimony approaches that rely on sequence divergence (nucleotide or amino acid), gene order based approaches explore a much larger search space. We contrasted this approach with three existing parsimonious gene order techniques: BPAnalysis <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, GRAPPA <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and MGR <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. BPAnalysis attempts to solve the NP-hard breakpoint median problem using the traveling salesman problem (TSP) heuristic to minimize the breakpoint distance between gene orders. Solving the TSP for all nodes across all possible trees is exponential in the number of genomes and number of genes. BPAnalysis works for gene orders on uni-chromosomal genomes and trees of eight or fewer leaves <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. GRAPPA is an optimized re-implementation of the BPAnalysis 'breakpoint distance' metric with algorithmic improvements for execution speed, data size, and inclusion of inversion distance. It utilizes the TSP heuristic for breakpoint medians and a branch-and-bound strategy for inversion medians. GRAPPA speeds up the BPAnalysis implementation significantly and can solve the breakpoint phylogeny or the inversion phylogeny problem; however, it remains an exponential time algorithm for breakpoint phylogeny. It is limited to a few hundred genes per genome and works for uni-chromosomal genomes. Other approaches based on GRAPPA include GRIMM <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>, which works on pairs of genomes. MGR, which uses GRIMM for distance computation, uses a 'reversal-distance' minimization strategy and is applicable to multi-chromosomal genomes. It proposes the identification of 'good reversals' that reduce the reversal distance between sets of three genomes and their ancestor for median inference. MGR is better in its speed and ability to handle multiple genomes when compared with GRAPPA; however, it has been tested only on a few hundred markers across genomes <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. In contrast to these techniques, the approach presented here handles multi-chromosomal datasets with thousands of markers.</p>
         <p>We used the most widely used existing implementation of parsimonious gene order based analysis, GRAPPA, to do a run-time comparison. GRAPPA has exponential runtime in the number of genomes and the number of genes. Even after limiting the input dataset to one <it>Drosophila </it>chromosome arm (about 1,650 common genes per species, as opposed to over 8,000 common genes and over 14,000 NGPs across the genome in our analysis and over 32,000 NGPs including outgroup species), GRAPPA did not complete and did not suggest a candidate phylogeny despite running over six hours. Our clustering approach derives NGPs and suggests a candidate phylogeny within a few minutes and our heuristic derives ancestral syntenic blocks in approximately 10 minutes for a significantly larger dataset on the same dedicated Pentium 4 laptop computer.</p>
         <p>To further test our approach, we used a test dataset of mitochondrial genomes previously used <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> to evaluate parsimonious gene order approaches. This is a set of 10 complete metazoan mitochondrial genomes <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> with 36 common genes. It contains two nematodes, two mollusks, two arthropods, two echinoderms, one annelid and one chordate <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. GRAPPA was previously shown to have run for more than 48 hours without suggesting a phylogeny for this dataset <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. MGR generated a tree in agreement with estimated phylogenetic relationships except the clustering of two arthropod genomes <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Our approach resulted in a clustering that tightly clustered the two arthropods in the dataset together and similarly clustered other metazoan genomes in broad agreement with the estimated phylogeny <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> with the single annelid genome as an outgroup (Additional data file 8).</p>
         <p>The primary limitations of existing approaches are speed and data size (typically only a few hundred markers). In contrast, this study utilized over 14,000 markers (Additional data file 2) to suggest a phylogeny within a few minutes and complete ancestral gene order inference in approximately 10 minutes for cases where other methods do not converge on a solution in any reasonable amount of time. While other approaches, like GRAPPA, require gene order and orientation information along a single chromosome, this approach accommodates incomplete assemblies of multi-chromosomal genomes. The order and orientation of assembly scaffolds need not be known. Additionally, by encoding contig and scaffold edge markers and arm level indexing, one can glean valuable insights despite assembly gaps.</p>
         <p>While this method provides a simple approach for inferring evolutionary relationships, rearrangement phylogeny, inversion count estimates, and ancestral gene order, we recognize some of its limitations. In order to overcome some of the limitations inherent in parsimonious approaches <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> (see Materials and methods) a number of practical biological assumptions are used. To ensure valid inferences at ancestral states, constraints are enforced at each ancestral state on the maximum number of pairs that a gene can be part of. Despite the fact that novel ancestral adjacencies, other than those in the input set, cannot be inferred, it has been shown that a high percentage of the total known gene count is assembled into ancestral syntenic blocks. Using the high-quality gene annotation of a single fly species (<it>D. melanogaster</it>) potentially introduces a bias in this analysis as a result of lineage-specific genes. In order to overcome this problem, the set of genes (protein coding segments in our case) that have homologs in all fly species are used, approximating equal gene content. Given that a majority of fly genes are shared across all fly species, this covers a large percentage of the known genes. As additional gene models for other fly species become available, they should be included in this analysis. This will also account for correctly quantifying gene gain and loss factors. Furthermore, homologous genome markers, other than protein coding genes, could also be used. This analysis can provide information identifying the areas of missing assembly data and positions of likely errors. In fact, under a small set of reasonable assumptions, the approach can suggest corrections to incomplete genomic assemblies. However, as is the case with any draft assembly, genome assembly errors are expected to be a factor in this analysis. Progressive cleanup of the genome assembly will lead to better results. This method potentially has some of the same limitations as other approaches associated with incorrect identification of homologous genes in the presence of paralogs. This has been addressed by selecting one member of each gene family as the best homolog (in the case of paralogs) based on local gene context and gene structure. It should be noted that the technique used in deriving rearrangement break counts could easily be translated to compute inversion counts along a branch.</p>
         <p>While deriving phylogenetic relationships among a set of species, the rationale used by the NGP approach is based on maximizing arm-indexed 'exclusively shared NGPs' (see Materials and methods). Although such constructs can increase certainty about tree topology, inferring branch lengths from rearrangements should be treated with caution as evolutionary rates of rearrangement might differ among lineages <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. While arm-indexing of NGPs results in a powerful tool for grouping species that share transposition events like the pericentric inversion in <it>D. yakuba </it>and <it>D. erecta </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, it is prone to limitations of assembly errors or single-species transpositions involving a large number of NGPs. Assembly errors that incorrectly join scaffolds belonging to different Muller elements might result in NGPs being assigned an incorrect arm-index based on majority homolog presence on the super-scaffold. Such inaccuracies can lead to incorrect phylogenetic partitioning. Additionally, a large number of real transposition or other rearrangement events in a single species could lead to different phylogenetic groupings based on the total number of NGPs involved in such events. If that total rivals the number of NGPs shared (exclusively) with a cluster of evolutionarily close species, it would result in the placement of this species outside the cluster. An extension of this study showed that the placement of <it>D. willistoni </it>differed from the classical <it>Drosophila </it>phylogeny <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and from studies involving mutation clocks <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Based on NGP analysis, after compensating for incorrect assembly joins, <it>D. willistoni </it>was placed as an outgroup species to the set of all genus <it>Drosophila </it>species under consideration (data not shown). Additional analysis with <it>SRP54 </it>and <it>SRP19 </it>protein sequences using parsimony and maximum likelihood approaches showed mixed results where one agreed with NGP phylogenetic partitioning (data not shown). Alternative NGP clustering solutions (see Materials and methods) and the relative number of gene pairs involved (an indicator of the strength of clustering) could be used in conjunction with gene tree results to select a candidate phylogeny amongst a set of close alternatives suggested by the NGP approach.</p>
         <p>While inferring rearrangement counts, the method performs well for a set of closely related species where a large majority of the genes are conserved across all species. For example, within genus <it>Drosophila</it>, there are a large number of shared genes that result in a strong signal. However, as additional evidence is added from evolutionarily distant species, lack of a strong signal (absence of homologous genes, presence of a large number of rearrangement events leading to the outgroup species, lack of a large number of shared NGPs) limits the utility of such evidence. At the root of the <it>Drosophila </it>tree (Figure <figr fid="F5">5</figr>), for example, NGPs that have conflicting evidence from the subgenus <it>Sophophora </it>and subgenus <it>Drosophila </it>sides of the tree would normally be resolved by the algorithm with evidence from outgroup species. However, the large evolutionary distance of the outgroup species used in this study provides a diluted NGP signal, due to a large number of rearrangements along that branch. For example, only 2% of the ambiguities at the genus <it>Drosophila </it>root could be resolved with evidence from outgroup species (NGP evidence from at least one outgroup species and one <it>Drosophila </it>species). A number of ambiguities that could probably be resolved to be a '1' at the root remain unresolved. As a result, one of the limitations of this method is that it undercounts the number of rearrangement breaks at the branches close to the root of the tree (of closely related species) due to diluted signal from outgroup species (Figures <figr fid="F3">3</figr> and <figr fid="F5">5</figr>).</p>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>Two-stage tree traversal algorithm example</p>
            </caption>
            <text>
               <p>Two-stage tree traversal algorithm example. Species A through G are shown with representative gene pair content (four pairs: ab, cd, ef, gh; an underscore '_' implies that that pair does not exist in that species). The state of pairs at each node is shown and state transitions are shown in bold font. <b>(a) </b>Leaf-to-root traversal. Ancestral states of gene pairs are assigned with the constraint that a gene can be in at most two pairs at any given node. A '1' implies that the pair exists at a given node where at least one species on either side of the node has that pair. A '0' implies that it does not exist in any leaf species reachable from that node. An 'X' implies that the state is unknown due to conflicting 1/0 or X/0 information from child nodes (that is, a '1'/'X' exists for that pair on one side of the node and a '0' on the other side). 0 &#8594; X, 1 &#8594; X, and X &#8594; 1 transitions are seen during this leaf-to-root tree traversal. In the case of pairs like cd*, where a '1' and '0' are inferred at the child nodes at the root of the tree, and there is no further evidence from outgroup species, the state is left undetermined and does not contribute to rearrangement analysis. It is hoped that addition of more genomes in this analysis will help resolve this in the future. In cases where the root value is 'X' (as in pair gh**), it is set to '1' if an outgroup species has this pair (given that it already exists in at least one non-outgroup species), else it does not contribute to this analysis. <b>(b) </b>Root-to-leaf traversal. Pair gh is assumed to be set to '1' at the root of the tree for this example, using the criteria above. Rearrangements are assigned to tree branches. A 0 &#8594; 1 transition reflects creation of a pair that did not exist at an ancestral state, including pairs unique to a species. A 1 &#8594; 0 transition represents a pair being lost due to a rearrangement. X &#8594; 0 and X &#8594; 1 transitions at nodes represent inheritance of an inferred ancestral state where the current value is unknown due to conflicting child evidence. The rearrangement phylogeny counts the number of 1 &#8594; 0 transitions (NGP disruptions) along each branch. See Additional data file 10 for a detailed description of the method.</p>
            </text>
            <graphic file="gb-2007-8-11-r236-5"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>This approach has been shown to outperform existing techniques with its speed and ability to handle genome-scale datasets far exceeding current limitations. The ability to handle multi-chromosomal datasets with thousands of markers, the use of 'exclusive shared NGPs' for clustering, the use of arm indexing to amplify the signal between closely related species, accommodations for genome assembly incompleteness, and the two-stage tree traversal with biologically relevant assumptions to infer ancestral states are the primary features of this method. The results place major aspects of the currently believed evolutionary relationships among different <it>Drosophila </it>species on a solid footing based on full-genome comparative analysis. The clustering supports the placement of <it>D. yakuba </it>based on a large set of markers (over 14,000). This analysis has, for the first time, provided an accurate lower bound for the number of chromosomal rearrangements that might have occurred among these species since their last common ancestor. With a sequence of decreasing stringency assumptions, a set of likely ancestral syntenic gene clusters of increasing size has been inferred. With the availability of additional fly and insect genomes, this analysis can be easily extended to include additional evidence to refine the results.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <p>One of the important assumptions exploited in this work is that chromosomal inversions in a given nucleotide sequence are rare events that result in the disruption of two pairs of neighboring genes and that the likelihood of the same inversion taking place independently along disjoint lineages is low. Neighboring pairs of homologous genes (NGPs) showing the same pair-wise orientation in distant species are considered to have escaped rearrangements via genomic inversions. Furthermore, despite the large number of theoretically possible gene pairs formed by over 8,000 genes, in practice only a fraction of this set is seen across all species. It is assumed that the probability of an inversion creating a NGP from an ancestral gene order is small, and smaller still if the NGP is seen across multiple species. In other words, a NGP found to exist in multiple species is assumed to have existed in the common ancestor, thus maximizing the similarity between extant species to derive an ancestral state.</p>
         <p>The method outlined below falls into the general class of parsimonious gene order methods <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B61">61</abbr></abbrgrp> used for phylogenetic analysis, with extensions based on our assumptions mentioned above. Most phylogenetic optimization approaches are known to be NP-hard, including the breakpoint median problem <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B23">23</abbr><abbr bid="B54">54</abbr></abbrgrp>. Similar to some previous approaches <abbrgrp><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr></abbrgrp>, we reduce the set of genes to a binary encoding based on gene adjacency. We overcome some of the known limitations of parsimonious gene order approaches with a number of simplifying biological assumptions, which prove to be practical. These assumptions, related to constraints on ancestral states, varying gene content, and ortholog identification, are outlined below. We extend previous techniques to include orientation and chromosome arm (<it>Drosophila </it>Muller element) information. We then infer a phylogenetic tree topology via clustering of species to maximize shared pairs unique to a cluster. Following this, we estimate rearrangement counts as described below. In contrast to the maximum parsimony on binary encodings (MPBE) approach <abbrgrp><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr></abbrgrp>, we have added arm indexing information to strengthen the signal between closely related species, and clustering is based on 'exclusively shared NGPs' between groups of species rather than straightforward parsimony analysis on encoded sequences.</p>
         <p>All pairs of adjacent orthologous genes are identified across a set of eight fly species and four outgroup species. Beginning with <it>D. melanogaster</it>'s release 4.3 annotated gene set <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B63">63</abbr></abbrgrp>, genome sequences for seven other fly species <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> (version CAF1: comparative analysis freeze 1) and four outgroup species (<it>A. gambiae/Agam </it><abbrgrp><abbr bid="B65">65</abbr><abbr bid="B66">66</abbr></abbrgrp>, <it>Aedes aegypti/Aaeg </it><abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr></abbrgrp>, <it>Apis mellifera/Amel </it><abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp>, <it>Tribolium castaneum/Tcas </it><abbrgrp><abbr bid="B71">71</abbr></abbrgrp>) were used. The seven <it>Drosophila </it>species used, other than <it>Dmel</it>, are: <it>D. yakuba </it>(<it>Dyak</it>), <it>D. erecta </it>(<it>Dere</it>), <it>D. ananassae </it>(<it>Dana</it>), <it>D. pseudoobscura </it>(<it>Dpse</it>), <it>D. virilis </it>(<it>Dvir</it>), <it>D. mojavensis </it>(<it>Dmoj</it>), <it>and D. grimshawi </it>(<it>Dgri</it>). This potentially large data set of adjacent gene pairs was stored in a simple and compact binary data structure. A simple parsimonious clustering based on maximizing the number of common gene pairs unique to a cluster was performed in order to identify a phylogenetic tree. Unique gene pairs point to rearrangements specific to a species. A two-stage iterative procedure that walks from the leaves of the <it>Drosophila </it>tree to the root and back to the leaves was then used to infer rearrangements along specific branches of the phylogenetic tree. It was also used to infer syntenic blocks (gene ordered clusters) at various nodes in the tree, including the root of the genus <it>Drosophila </it>tree, using a set of progressively relaxed criteria. The resulting dataset gives a probable ancestral gene arrangement and syntenic block structure at the root of the <it>Drosophila </it>tree. The key steps of this method are outlined below and detailed in Additional data file 10.</p>
         <sec>
            <st>
               <p>Homologous gene identification</p>
            </st>
            <p>In each species, genes homologous to the reference set (<it>D. melanogaster</it>) are identified while accommodating for assembly gaps <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. This was done using standard sequence comparison methods to maximize sequence similarity, including tBLASTn, along with techniques to distinguish orthologs from paralogs due to gene duplication. Neighboring gene context was also used to identify homologs, which we recognize adds some circularity as NGPs are later used to create syntenic blocks. It should be noted that, although homologous genes have been used in this analysis, the method presented here is applicable to a wide range of homologous genome markers for which homology between species can be determined. This includes non-protein coding genes, micro-RNAs and transposable elements.</p>
         </sec>
         <sec>
            <st>
               <p>Adjacent gene-pair classification</p>
            </st>
            <p>Using homologous gene sets for each species, pairs of adjacent genes are recorded based on their mutual orientation (direction of transcription). A pair can include two adjacent genes, in a specific order, that are: convergent (&#8594;&#8592;), divergent (&#8592;&#8594;), or transcribed in the same direction (&#8594;&#8594; and &#8592;&#8592;). The mutual order of transcription starts and ends is important in determining equivalence between pairs. In order to accommodate for gaps in genome assembly, genes found at the edges of assembly scaffolds are recorded as part of special pairs (_&#8592;, _&#8594;, &#8592; _, &#8594; _). Finally, scaffolds with a single gene hit are also noted (_&#8592; _, _&#8594; _).</p>
         </sec>
         <sec>
            <st>
               <p>Data structure</p>
            </st>
            <p>The data structure used to capture gene adjacency information is a five-dimensional binary matrix representing the presence or absence of a given gene pair in a given species with location and directionality also encoded:</p>
            <p>
               <display-formula><b>M*<sub><it>i</it>, <it>j</it>, <it>o</it>, <it>s</it>, <it>m </it></sub></b>= {0,1}</display-formula>
            </p>
            <p>where <it>i</it>, <it>j </it>= genes i and j; <it>o </it>= one of the gene pair orientations identified above; <it>s </it>= a given species; <it>m </it>= the arm index for the gene pair. A '1' implies that a gene pair consisting of adjacent genes <it>i </it>and <it>j </it>in a specific pair-wise orientation <it>o </it>exists in species <it>s </it>on the chromosomal arm (<it>Drosophila </it>Muller element) encoded by <it>m</it>. A '0' implies that it does not. Assembly scaffold edges and single gene scaffolds can be included as special case gene markers. Given the symmetric nature and sparse data content of this matrix, standard storage optimization techniques can be used to reduce the size of the stored binary data. Chromosomal arm encoding is typically applicable and useful in resolving relationships between close species that share the same chromosome architecture. For comparisons with outgroup species with different chromosomal architecture, this indexing requirement can be relaxed as NGP differences will dominate due to evolutionary divergence. Using this basic data structure, binary encoded arrays to aid easy lookup of NGPs across species can be devised (Additional data file 10).</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic reconstruction via clustering</p>
            </st>
            <p>Using a simple hierarchical clustering approach, pairs or groups of species are clustered in order to maximize the number of shared gene pairs unique to the clustered group ('exclusively shared NGPs'). This is based on the idea that species that share an evolutionary lineage possess (or lack) a number of identical inversions and hence share NGPs unique to the group. With the option of arm-indexed NGPs, this approach also allows for the clustering of groups of closely related species in smaller clusters although they might share NGPs in common with other species farther away. Alternatively, species could be clustered based on total number of common shared pairs (not necessarily unique to the grouping). Clustering continues until a binary partitioning of the species, based on decreasing cardinality, is obtained. A simple validation of this clustering was performed using a gene tree (Figure <figr fid="F2">2</figr>) generated with PHYLIP <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> using coding sequence predictions for <it>SRP54 </it>and <it>SRP19 </it>genes for <it>Drosophila </it>species <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> (Drosophila 12 Genomes Consortium, 2007).</p>
            <p>Any intermediate results that violate previously clustered boundaries can be analyzed for alternative or weak relationships between species. For example, in the arm-indexed clustering results for these species (Additional data file 4), the first violation with 463 NGPs shows that <it>D. melanogaster </it>shares a significant number of NGPs with all the flies except <it>D. yakuba </it>and <it>D. erecta</it>. These three species were previously clustered together (544 NGPs). <it>D. yakuba </it>and <it>D. erecta </it>should account for the disruption or translocation of 463 NGPs. This is borne out by the strong clustering of <it>D. yakuba </it>and <it>D. erecta </it>(751 NGPs) where the underlying translocation and inversion events account for the disruption of NGPs previously shared with <it>D. melanogaster</it>. The second violation with 357 NGPs (found in all other species except <it>D. pseudoobscura</it>) points to the fact that <it>D. pseudoobscura </it>exhibits a large number of taxon-specific inversion events (and hence unique NGPs). This is also borne out by the first line labeled as a 'leaf', which counts the actual number of <it>D. pseudoobscura </it>specific unique NGPs (937) derived from this dataset. Analysis of non-arm-indexed NGP clustering results (Additional data file 5) shows alternative clusters for the <it>D. melanogaster</it>, <it>D. yakuba and D. erecta </it>trio: <it>D. melanogaster + D. erecta </it>(16 NGPs), <it>D. yakuba + D. erecta </it>(15 NGPs), and <it>D. melanogaster + D. yakuba </it>(9 NGPs). Arm-indexed clustering shows a strong signal reflecting an underlying shared pericentric inversion and selects the second of the three solutions above.</p>
         </sec>
         <sec>
            <st>
               <p>Rearrangement counts along various evolutionary paths</p>
            </st>
            <p>The rearrangement phylogeny estimates the number of inferred ancestral NGP disruptions along a branch of the evolutionary tree. An estimate of the inversion count can be computed from a rearrangement phylogeny as the number of inversion events that lead to the observed rearrangements (each inversion disrupts two ancestral gene pairs and creates two new pairs). Once the phylogenetic relationships have been derived, a two-stage tree traversal methodology can be used to infer the rearrangement phylogeny. The arm level indexing criteria is relaxed at this stage in order to allow NGPs on different arms to contribute to ancestral gene order inference. This allows the consideration of pericentric inversions or segmental transpositions in this method. This process is summarized with a simple example (Figure <figr fid="F5">5</figr>). First, a tree traversal from the leaves to the root can be used to infer the NGPs that are in common between each ancestral node and its child nodes, based on our heuristic of maximizing the similarity between extant species at any ancestral node. An ancestral node where any two leaves reachable from that node along disjoint paths show the same NGP is assumed to have had that NGP in its sequence. The motivation behind this heuristic is the assumption that an NGP that exists in at least one species on either side of a node exists at that node, and the likelihood of independent inversions creating these pairs in different species is low. Conflicting evidence from two child nodes suggesting that rearrangements might have taken place along the path to a child are noted. These ambiguities are resolved locally with the next species along the hierarchy or higher up in the hierarchy, including using outgroup species information. In cases where a node is inferred to have one NGP corresponding to one pair of a two break inversion event relative to a neighboring species, the other pair can be inferred to exist at the node if its assignment is ambiguous. Additionally, a gene is constrained to be part of, at most, two pairs at any given node. Ambiguities in determining the NGPs at the root of the genus <it>Drosophila </it>tree can be resolved by using an outgroup species as far as possible. After the leaves-to-root traversal is done, a traversal from the root to leaves is initiated. During this process, any remaining ambiguities existing at internal nodes are resolved by inheriting the ancestral state of an NGP wherever possible. Rearrangement counts along each branch can be estimated by counting the number of cases where an NGP exists at a given node, but does not exist at a child node. This gives the rearrangement phylogeny and an alternative estimate of the branch lengths of the phylogenetic tree.</p>
            <p>Rearrangement events are the result of inversions that disrupt two NGPs present at the ancestral state (and create two new NGPs). Inversions along various paths can be counted using the fact that four pairs (two disruptions and two creations) are involved in an inversion. Thus, the disrupted pairs counted in the rearrangement phylogeny can be divided by two to get an estimate of the inversion count. It is possible to extend this analysis to include a correction factor to account for the impact of rearrangement breakpoints being reused based on varying reuse rates between species.</p>
            <p>In cases where only a single species on either side of a node has an NGP that is absent from all other species, a rearrangement break for this NGP would be assigned to all top level internal branches that lead to other species clusters (and to leaf branches in the same cluster as the species having the NGP). The extreme case would be where there are a large number of species on both sides of the node. Although this is expected to be a rare occurrence, genes on the edges of genomic hotspots can contribute to this phenomenon.</p>
         </sec>
         <sec>
            <st>
               <p>Ancestral syntenic block inference</p>
            </st>
            <p>Chaining together NGPs that share a gene in common (in the right orientation), or bridging blocks using existing NGPs, ultimately leads to a set of ancestral syntenic blocks at the root of the <it>Drosophila </it>tree. We use a set of progressively relaxed stringency assumptions. First, we recursively chain together NGPs that have a gene in common (in the right orientation). A gene is restricted to be part of two pairs at most, so conflicts have been resolved at the earlier stage to determine the existence of an NGP at a node. This process will generate an initial set of syntenic blocks of genes at an ancestral node. Extending blocks with NGPs or other blocks into larger entities can only be done if the NGP used to bridge them matches the mutual orientation of the genes on the edges of these blocks. It is occasionally necessary to flip a block or NGP for this to be feasible.</p>
            <p>The criteria for forming and enlarging syntenic blocks can be progressively relaxed based on the assumption that the probability of independent inversion events bringing together a particular pair of genes in disjoint lineages is rare. The various criteria are as follows.</p>
            <sec>
               <st>
                  <p>Criterion 1</p>
               </st>
               <p>This is the case where the procedure described in the example (Figure <figr fid="F5">5</figr>) is used to determine NGPs that exist at the <it>Drosophila </it>root and these NGPs are used to form syntenic blocks as described above.</p>
            </sec>
            <sec>
               <st>
                  <p>Criterion 2</p>
               </st>
               <p>For criterion 2, extend syntenic blocks by recursively chaining together blocks whose edges appear in a gene pair in at least one fly species and an outgroup species with the correct mutual orientation. Flipping of NGPs or blocks might be necessary to accomplish this.</p>
            </sec>
            <sec>
               <st>
                  <p>Criterion 3</p>
               </st>
               <p>For criterion 3, extend syntenic blocks by chaining together blocks whose edges appear in a gene pair in at least one fly species in the desired orientation. In the case of assembly gaps, this also covers cases where a gene might be on the edge of a scaffold in multiple species. This strategy sometimes leads to joins of lower confidence, as in some cases a block might be a candidate for merging with two separate blocks with conflicting evidence from individual species. In the absence of additional information, such joins made with an arbitrary choice between alternative blocks can be tagged as low-confidence joins.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Identifying genomic regions of increased rearrangement activity</p>
            </st>
            <p>This approach leads to the straightforward identification of pairs of genes where each individual gene is found in multiple dissimilar pairs across all species. By using a reasonable threshold of a number of pairs, where each gene is part of that many dissimilar pairs, a set of genes bordering probable regions of high rearrangement activity can be obtained. It should be noted that the identification of such regions does not directly imply rearrangement hotspots with resolution at the nucleotide level.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p><it>Dana</it>, <it>D. ananassae</it>; <it>Dere</it>, <it>D. erecta</it>; <it>Dgri</it>, <it>D. grimshawi</it>; <it>Dmel</it>, <it>D. melanogaster</it>; <it>Dmoj</it>, <it>D. mojavensis</it>; <it>Dpse</it>, <it>D. pseudoobscura</it>; <it>Dvir</it>, <it>D. virilis</it>; <it>Dyak</it>, <it>D. yakuba</it>; NGP, neighboring gene pair; TSP, traveling salesman problem.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AB and TFS contributed to the design and implementation of the algorithm, the analysis and interpretation of the data, and to drafting and revising the manuscript. WMG contributed to the analysis and interpretation of the data and to revising the manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a filtered list of genes common to all <it>Drosophila </it>species in set (high confidence placements). Additional data file <supplr sid="S2">2</supplr> lists NGPs based on common gene set (in file 1) with arm indexing and only within <it>Drosophila </it>species (without outgroup pairs) - used in clustering of fly species (Figure <figr fid="F1">1a</figr>). Additional data file <supplr sid="S3">3</supplr> list NGPs without arm indexing (to allow for NGPs from outgroup species with varying chromosome architecture) based on <it>Drosophila </it>common gene set (in file 1) - used in the rearrangement and ancestral gene order analysis. Additional data file <supplr sid="S4">4</supplr> provides clustering results for Figure <figr fid="F1">1(a)</figr>. Additional data file <supplr sid="S5">5</supplr> provides clustering results for Figure <figr fid="F1">1(b)</figr>. Additional data file <supplr sid="S6">6</supplr> shows ancestral adjacencies (blocks) at the <it>Drosophila </it>root under criterion 1. Additional data file <supplr sid="S7">7</supplr> shows ancestral adjacencies (blocks) at the <it>Drosophila </it>root under criterion 2. Additional data file <supplr sid="S8">8</supplr> is a summary of results of testing with the mitochondrial test dataset. Additional data file <supplr sid="S9">9</supplr> is a Tree file from PHYLIP version 3.65 for <it>SRP54 </it>and <it>SRP19 </it>gene sequences used for Figure <figr fid="F2">2</figr>. Additional data file <supplr sid="S10">10</supplr> gives a detailed description of the method. The code is available from the authors, upon request.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Filtered list of genes common to all <it>Drosophila </it>species in set (high confidence placements)</p>
            </caption>
            <text>
               <p>Filtered list of genes common to all <it>Drosophila </it>species in set (high confidence placements).</p>
            </text>
            <file name="gb-2007-8-11-r236-S1.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>NGPs based on common gene set (in file 1) with arm indexing and only within <it>Drosophila </it>species (without outgroup pairs)</p>
            </caption>
            <text>
               <p>NGPs based on common gene set (in file 1) with arm indexing and only within <it>Drosophila </it>species (without outgroup pairs) - used in clustering of fly species (Figure <figr fid="F1">1a</figr>).</p>
            </text>
            <file name="gb-2007-8-11-r236-S2.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>NGPs without arm indexing (to allow for NGPs from outgroup species with varying chromosome architecture) based on <it>Drosophila </it>common gene set (in file 1)</p>
            </caption>
            <text>
               <p>NGPs without arm indexing (to allow for NGPs from outgroup species with varying chromosome architecture) based on <it>Drosophila </it>common gene set (in file 1) - used in the rearrangement and ancestral gene order analysis.</p>
            </text>
            <file name="gb-2007-8-11-r236-S3.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Clustering results for Figure <figr fid="F1">1(a)</figr></p>
            </caption>
            <text>
               <p>Clustering results for Figure <figr fid="F1">1(a)</figr></p>
            </text>
            <file name="gb-2007-8-11-r236-S4.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Clustering results for Figure <figr fid="F1">1(b)</figr></p>
            </caption>
            <text>
               <p>Clustering results for Figure <figr fid="F1">1(b)</figr>.</p>
            </text>
            <file name="gb-2007-8-11-r236-S5.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>Ancestral adjacencies (blocks) at the <it>Drosophila </it>root under criterion 1</p>
            </caption>
            <text>
               <p>Ancestral adjacencies (blocks) at the <it>Drosophila </it>root under criterion 1</p>
            </text>
            <file name="gb-2007-8-11-r236-S6.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>Ancestral adjacencies (blocks) at the <it>Drosophila </it>root under criterion 2</p>
            </caption>
            <text>
               <p>Ancestral adjacencies (blocks) at the <it>Drosophila </it>root under criterion 2.</p>
            </text>
            <file name="gb-2007-8-11-r236-S7.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>Summary of results of testing with the mitochondrial test dataset</p>
            </caption>
            <text>
               <p>Summary of results of testing with the mitochondrial test dataset.</p>
            </text>
            <file name="gb-2007-8-11-r236-S8.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S9">
            <title>
               <p>Additional data file 9</p>
            </title>
            <caption>
               <p>Tree file from PHYLIP version 3.65 for <it>SRP54 </it>and <it>SRP19 </it>gene sequences used for Figure <figr fid="F2">2</figr></p>
            </caption>
            <text>
               <p>Tree file from PHYLIP version 3.65 for <it>SRP54 </it>and <it>SRP19 </it>gene sequences used for Figure <figr fid="F2">2</figr>.</p>
            </text>
            <file name="gb-2007-8-11-r236-S9.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S10">
            <title>
               <p>Additional data file 10</p>
            </title>
            <caption>
               <p>Detailed description of the method</p>
            </caption>
            <text>
               <p>Detailed description of the method.</p>
            </text>
            <file name="gb-2007-8-11-r236-S10.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors wish to thank: Susan Russo (FlyBase, Harvard) for protein homology data; Professor Stanley Letovsky (Boston University) for insightful comments and suggestions; the Harvard FlyBase team for release 4.3 annotation datasets; the AAA coordinating committee for the 12 <it>Drosophila </it>genomes project (All <it>Drosophila </it>data are part of the AAA CAF1 freeze and has been used in accordance with AAA guidelines for a companion paper); Agencourt Biosciences for sequence data for <it>Dere</it>, <it>Dana</it>, <it>Dvir</it>, <it>Dmoj</it>, <it>Dgri </it>genome assemblies; Baylor College of Medicine HGSC for <it>Dpse and Tribolium castaneum </it>genome assemblies; Washington University GSC for assembly of <it>Dyak</it>, Broad Institute and TIGR for <it>Aedes aegypti </it>genome assembly; Venky Iyer (Eisen Lab UC Berkeley) and AAA for releasing GLEANR predictions used for <it>SRP54 </it>and <it>SRP19 </it>homologous sequences in <it>Drosophila </it>species; BMERC (Boston U.) computing and support staff; Nancy Sands for proofreading the manuscript. This work was supported by a subcontract from Harvard University under NIH grant HG000739.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Salivary chromosome maps with a key to the banding of the chromosomes of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bridges</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>J Hered</source>
            <pubdate>1935</pubdate>
            <volume>26</volume>
            <fpage>60</fpage>
            <lpage>64</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Microgeographic variation in <it>Drosophila pseudoobscura</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Dobzhansky</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1939</pubdate>
            <volume>25</volume>
            <fpage>311</fpage>
            <lpage>314</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The comparative genetics of <it>Drosophila pseudoobscura </it>and <it>D. melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>J Genet</source>
            <pubdate>1937</pubdate>
            <volume>34</volume>
            <fpage>415</fpage>
            <lpage>432</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The homologies of the chromosome elements in the genus <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Novitski</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1941</pubdate>
            <volume>26</volume>
            <fpage>517</fpage>
            <lpage>541</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1209144</pubid>
                  <pubid idtype="pmpid" link="fulltext">17247021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Inversions in the chromosomes of <it>Drosophila pseudoobscura</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Dobzhansky</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>AH</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1938</pubdate>
            <volume>23</volume>
            <fpage>28</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1209001</pubid>
                  <pubid idtype="pmpid" link="fulltext">17246876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Geographical distribution and cytology of 'sex ratio' in <it>Drosophila pseudoobscura </it>and related species.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Dobzhansky</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1936</pubdate>
            <volume>21</volume>
            <fpage>473</fpage>
            <lpage>90</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1208687</pubid>
                  <pubid idtype="pmpid" link="fulltext">17246805</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The relations of inversions in the X chromosome of <it>Drosophila melanogaster </it>to crossing over and disjunction.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Beadle</snm>
                  <fnm>GW</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1936</pubdate>
            <volume>21</volume>
            <fpage>544</fpage>
            <lpage>604</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Inversions in the third chromosome of wild race of <it>Drosophila pseudoobscura</it>, and their use in the study of the history of the species.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Dobzhansky</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1936</pubdate>
            <volume>22</volume>
            <fpage>448</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1076803</pubid>
                  <pubid idtype="pmpid">16577723</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Chromosomal phylogeny and evolution of gibbons (Hylobatidae).</p>
            </title>
            <aug>
               <au>
                  <snm>Muller</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hollatz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wienberg</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>2003</pubdate>
            <volume>113</volume>
            <fpage>493</fpage>
            <lpage>501</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14569461</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Chromosome inversions, local adaptation, and speciation.</p>
            </title>
            <aug>
               <au>
                  <snm>Kirkpatrick</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2006</pubdate>
            <volume>173</volume>
            <fpage>419</fpage>
            <lpage>434</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461441</pubid>
                  <pubid idtype="pmpid" link="fulltext">16204214</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <aug>
               <au>
                  <snm>White</snm>
                  <fnm>MJD</fnm>
               </au>
            </aug>
            <source>Animal Cytology and Evolution</source>
            <publisher>Cambridge: Cambridge University Press</publisher>
            <pubdate>1973</pubdate>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Genome sequence of the chlorinated compound-respiring bacterium <it>Dehalococcoides </it>species strain CBDB1.</p>
            </title>
            <aug>
               <au>
                  <snm>Kube</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Beck</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zinder</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Kuhl</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Reinhardt</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Adrian</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>10</volume>
            <fpage>1269</fpage>
            <lpage>1273</lpage>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Comparative genome sequencing of <it>Drosophila pseudoobscura</it>: Chromosomal, gene, and cis-element evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bettencourt</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Letovsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hubisz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Meisel</snm>
                  <fnm>RP</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>1</fpage>
            <lpage>18</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome.</p>
            </title>
            <aug>
               <au>
                  <cnm>Mouse Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12466850</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Human chromosome 3 and pig chromosome 13 show complete synteny conservation but extensive gene-order differences.</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>HF</fnm>
               </au>
               <au>
                  <snm>Ernst</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Yerle</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pinton</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rothschild</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Chardon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rogel-Gaillard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tuggle</snm>
                  <fnm>CK</fnm>
               </au>
            </aug>
            <source>Cytogenet Cell Genet</source>
            <pubdate>1999</pubdate>
            <volume>85</volume>
            <fpage>273</fpage>
            <lpage>278</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10449917</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Initial sequence of the chimpanzee genome and comparison with the human genome.</p>
            </title>
            <aug>
               <au>
                  <cnm>The Chimpanzee Sequencing and Analysis Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>69</fpage>
            <lpage>87</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16136131</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Genome evolution in yeasts.</p>
            </title>
            <aug>
               <au>
                  <snm>Dujon</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sherman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fischer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Durrens</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Casaregola</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lafontaine</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>De Montigny</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Marck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Neuveglise</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Talla</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>430</volume>
            <fpage>35</fpage>
            <lpage>44</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15229592</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Initial sequencing and analysis of the human genome.</p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11237011</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals).</p>
            </title>
            <aug>
               <au>
                  <snm>Hannenhalli</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>Proceedings of the 27th Annual ACM-SIAM Symposium on the Theory of Computing: 1995</source>
            <pubdate>1995</pubdate>
            <fpage>178</fpage>
            <lpage>189</lpage>
            <note>May 29 - June 01, 1995; Editors: F. Tom Leighton, Allan Borodin; Las Vegas, Nevada; ACM Press, New York, NY, USA</note>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement.</p>
            </title>
            <aug>
               <au>
                  <snm>Kececioglu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sankoff</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Algorithmica</source>
            <pubdate>1995</pubdate>
            <volume>13</volume>
            <fpage>180</fpage>
            <lpage>210</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Breakpoint phylogenies. Genome inform. ser. workshop.</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bourque</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sankoff</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Infor</source>
            <pubdate>1997</pubdate>
            <volume>8</volume>
            <fpage>25</fpage>
            <lpage>34</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The median problem for breakpoints in comparative genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sankoff</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proceedings of the Third Annual International Conference on Computing and Combinatorics: 1997</source>
            <publisher>Springer Verlag</publisher>
            <editor>Jiang T, Lee DT</editor>
            <pubdate>1997</pubdate>
            <fpage>251</fpage>
            <lpage>263</lpage>
            <note>[<it>Lecture Notes in Computer Science</it>, vol. 1276] August 20-22; Shanghai, China; Springer-Verlag, London, UK</note>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Reconstructing phylogenies from gene-content and gene-order data.</p>
            </title>
            <aug>
               <au>
                  <snm>Moret</snm>
                  <fnm>BME</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Warnow</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mathematics of Evolution and Phylogeny</source>
            <publisher>Oxford: Oxford University Press</publisher>
            <editor>Gascuel O</editor>
            <pubdate>2005</pubdate>
            <fpage>321</fpage>
            <lpage>352</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Fast phylogenetic methods for the analysis of genome rearrangement data: an empirical study.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Jansen</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Moret</snm>
                  <fnm>BME</fnm>
               </au>
               <au>
                  <snm>Raubeson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Warnow</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proceedings of the 7th Pacific Symposium on Biocomputing: 2002; Hawaii</source>
            <publisher>World Scientific Pub</publisher>
            <pubdate>2002</pubdate>
            <fpage>524</fpage>
            <lpage>535</lpage>
            <note>January 3-7 2002; Lihue, Hawaii, USA; Editors: Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Teri E. Klein; World Scientific, New Jersey, USA</note>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Steps toward accurate reconstruction of phylogenies from gene-order data.</p>
            </title>
            <aug>
               <au>
                  <snm>Moret</snm>
                  <fnm>BME</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Warnow</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Comput Syst Sci</source>
            <pubdate>2002</pubdate>
            <volume>65</volume>
            <fpage>508</fpage>
            <lpage>525</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Formulations and hardness of multiple sorting by reversals.</p>
            </title>
            <aug>
               <au>
                  <snm>Caprara</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proceedings of the 3rd International Conference on Computational Molecular Biology</source>
            <publisher>ACM Press</publisher>
            <pubdate>1999</pubdate>
            <fpage>84</fpage>
            <lpage>93</lpage>
            <note>April 11-14 1999; Lyon, France; Editors: S. Istrail, P. Pevzner, M. Waterman; ACM Press, New York, NY, USA</note>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The median problems for breakpoints are NP-complete.</p>
            </title>
            <aug>
               <au>
                  <snm>Pe'er</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Elec Colloq Comput Complexity</source>
            <pubdate>1998</pubdate>
            <volume>71</volume>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Multidirectional chromosome painting reveals a remarkable syntenic homology between the greater galagos and the slow loris.</p>
            </title>
            <aug>
               <au>
                  <snm>Stanyon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dumas</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Stone</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bigoni</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Am J Primatol</source>
            <pubdate>2006</pubdate>
            <volume>68</volume>
            <fpage>349</fpage>
            <lpage>359</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16534804</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Inversions and the dynamics of eukaryotic gene order.</p>
            </title>
            <aug>
               <au>
                  <snm>Huynen</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>304</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11377779</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>The <it>Anopheles </it>genome and comparative insect genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Severson</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>97</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12364783</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Molecular phylogeny and divergence times of drosophilid species.</p>
            </title>
            <aug>
               <au>
                  <snm>Russo</snm>
                  <fnm>CAM</fnm>
               </au>
               <au>
                  <snm>Takezaki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1995</pubdate>
            <volume>12</volume>
            <fpage>391</fpage>
            <lpage>404</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7739381</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <aug>
               <au>
                  <snm>Powell</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Progress and Prospects in Evolutionary Biology: The Drosophila Model</source>
            <publisher>New York: Oxford University Press</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Research resources for <it>Drosophila</it>: the expanding universe.</p>
            </title>
            <aug>
               <au>
                  <snm>Matthews</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>179</fpage>
            <lpage>193</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15738962</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Fly meets shotgun: shotgun wins.</p>
            </title>
            <aug>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2000</pubdate>
            <volume>24</volume>
            <fpage>327</fpage>
            <lpage>328</lpage>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Low occurrence of gene transposition events during the evolution of the genus <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Ranz</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gonz&#225;lez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Casals</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Int J Org Evolution</source>
            <pubdate>2003</pubdate>
            <volume>57</volume>
            <fpage>1325</fpage>
            <lpage>1335</lpage>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Bearings of the Drosophila work on systematics.</p>
            </title>
            <aug>
               <au>
                  <snm>Muller</snm>
                  <fnm>HJ</fnm>
               </au>
            </aug>
            <source>The New Systematics</source>
            <publisher>Oxford, UK: Clarendon Press</publisher>
            <editor>Huxley J</editor>
            <pubdate>1940</pubdate>
            <fpage>185</fpage>
            <lpage>268</lpage>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Retroposed new genes out of the X in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Betr&#225;n</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1854</fpage>
            <lpage>1859</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187566</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Dntf-2r, a young <it>Drosophila </it>retroposed gene with specific male expression under positive Darwinian selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Betr&#225;n</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2003</pubdate>
            <volume>164</volume>
            <fpage>977</fpage>
            <lpage>988</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1462638</pubid>
                  <pubid idtype="pmpid" link="fulltext">12871908</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Molecular phylogeny of the <it>Drosophila melanogaster </it>species subgroup.</p>
            </title>
            <aug>
               <au>
                  <snm>Ko</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>David</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Akashi</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2003</pubdate>
            <volume>57</volume>
            <fpage>562</fpage>
            <lpage>573</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14738315</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Molecular evolution of the duplicated <it>Amy </it>locus in the <it>Drosophila melanogaster </it>species subgroup: Concerted evolution only in the coding region and an excess of nonsynonymous substitutions in speciation.</p>
            </title>
            <aug>
               <au>
                  <snm>Shibata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yamazaki</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1995</pubdate>
            <volume>141</volume>
            <fpage>223</fpage>
            <lpage>236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1206720</pubid>
                  <pubid idtype="pmpid" link="fulltext">8536970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>The molecular evolution of the alcohol dehydrogenase and alcohol dehydrogenase-related genes in the <it>Drosophila melanogaster </it>species subgroup.</p>
            </title>
            <aug>
               <au>
                  <snm>Jeffs</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <fpage>287</fpage>
            <lpage>304</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8170369</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Assessing the impact of comparative genomic sequence data on the functional annotation of the <it>Drosophila </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Rinc&#243;n-Limas</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gnirke</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pacleb</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Conservation of regulatory sequences and gene expression patterns in the disintegrating <it>Drosophila </it>Hox gene complex.</p>
            </title>
            <aug>
               <au>
                  <snm>Negre</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Casillas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suzanne</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>S&#225;nchez-Herrero</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Akam</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nefedov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barbadilla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>692</fpage>
            <lpage>700</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1088297</pubid>
                  <pubid idtype="pmpid" link="fulltext">15867430</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>FlyBase: genes and gene models.</p>
            </title>
            <aug>
               <au>
                  <snm>Drysdale</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <cnm>Flybase Consortium</cnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D390</fpage>
            <lpage>D395</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540000</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608223</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Ranz</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Casals</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>230</fpage>
            <lpage>239</lpage>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Relationships within the <it>melanogaster </it>species subgroup of the genus <it>Drosophila </it>(<it>Sophophora</it>) IV. The chromosomes of two new species.</p>
            </title>
            <aug>
               <au>
                  <snm>Lemeunier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Chromosoma</source>
            <pubdate>1984</pubdate>
            <volume>89</volume>
            <fpage>343</fpage>
            <lpage>351</lpage>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Relationships within the <it>melanogaster </it>species subgroup of the genus <it>Drosophila </it>(Sophophora). II. Phylogenetic relationships between six species based upon polytene chromosome banding sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Lemeunier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Proc R Soc Lond B Biol Sci</source>
            <pubdate>1976</pubdate>
            <volume>193</volume>
            <fpage>275</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6967</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>On the abundance and distribution of transposable elements in the genome of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bartolome</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Maside</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Charlesworth</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>926</fpage>
            <lpage>937</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12032249</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Recombination rate and the distribution of transposable elements in the <it>Drosophila melanogaster </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Rizzon</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marais</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gouy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Biemont</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>400</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155295</pubid>
                  <pubid idtype="pmpid" link="fulltext">11875027</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Generation of a widespread <it>Drosophila </it>inversion by a transposable element.</p>
            </title>
            <aug>
               <au>
                  <snm>Caceres</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ranz</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Barbadilla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <fpage>415</fpage>
            <lpage>418</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10411506</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The Foldback-like transposon Galileo is involved in the generation of two different natural chromosomal inversions of <it>Drosophila buzzatii</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Casals</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Caceres</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>674</fpage>
            <lpage>685</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12679549</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Molecular characterization and chromosomal distribution of Galileo, Kepler and Newton, three foldback transposable elements of the <it>Drosophila </it>buzzatii species complex.</p>
            </title>
            <aug>
               <au>
                  <snm>Casals</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Caceres</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Manfrin</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Gonzalez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2005</pubdate>
            <volume>169</volume>
            <fpage>2047</fpage>
            <lpage>2059</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1449584</pubid>
                  <pubid idtype="pmpid" link="fulltext">15695364</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Mobile elements and chromosomal evolution in the virilis group of <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Evgen'ev</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Zelentsova</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Poluectova</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lyozin</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Veleikodvorskaja</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Pyatkov</snm>
                  <fnm>KI</fnm>
               </au>
               <au>
                  <snm>Zhivotovsky</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Kidwell</snm>
                  <fnm>MG</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>11337</fpage>
            <lpage>11342</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">17201</pubid>
                  <pubid idtype="pmpid" link="fulltext">11016976</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Multiple genome rearrangement and breakpoint phylogeny.</p>
            </title>
            <aug>
               <au>
                  <snm>Sankoff</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Computational Biol</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <fpage>555</fpage>
            <lpage>570</lpage>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Genome-scale evolution: reconstructing gene orders in the ancestral species.</p>
            </title>
            <aug>
               <au>
                  <snm>Bourque</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>26</fpage>
            <lpage>36</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155248</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779828</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>GRIMM: Genome rearrangements web server.</p>
            </title>
            <aug>
               <au>
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>492</fpage>
            <lpage>493</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11934753</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>MGA Source Guide</p>
            </title>
            <aug>
               <au>
                  <snm>Boore</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <url>http://evogen.jgi.doe.gov</url>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Mitochondrial genomes of Galathealinum, Helobdella, and Platynereis: sequence and gene arrangement comparisons indicate that Pogonophora is not a phylum and Annelida and Arthropoda are not sister taxa.</p>
            </title>
            <aug>
               <au>
                  <snm>Boore</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>87</fpage>
            <lpage>106</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666709</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Results and patterns of chromosomal evolution in <it>Drosophila pseudoobscura </it>and <it>D. miranda</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bartolome</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Charlesworth</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2006</pubdate>
            <volume>173</volume>
            <fpage>773</fpage>
            <lpage>791</lpage>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Temporal patterns of fruit fly (<it>Drosophila</it>) evolution revealed by mutation clocks.</p>
            </title>
            <aug>
               <au>
                  <snm>Tamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>36</fpage>
            <lpage>44</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12949132</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data.</p>
            </title>
            <aug>
               <au>
                  <snm>Cosner</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Jansen</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Moret</snm>
                  <fnm>BME</fnm>
               </au>
               <au>
                  <snm>Raubeson</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Warnow</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wyman</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology: 2000; San Diego</source>
            <pubdate>2000</pubdate>
            <fpage>104</fpage>
            <lpage>115</lpage>
            <note>August 19-23; La Jolla, CA, USA; Editors: Philip Bourne, Michael Gribskov, Russ Altman, Nancy Jensen, Debra Hope, Thomas Lengauer, Julie Mitchell, Eric Scheeff, Chris Smith, Shawn Strande, and Helge Weissig; AAAI Press, CA, USA</note>
         </bibl>
         <bibl id="B62">
            <title>
               <p>An emperical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae.</p>
            </title>
            <aug>
               <au>
                  <snm>Cosner</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Jansen</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Moret</snm>
                  <fnm>BME</fnm>
               </au>
               <au>
                  <snm>Raubeson</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Warnow</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wyman</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families</source>
            <publisher>Dordrecht, Netherlands: Kluwer Academic Publishers</publisher>
            <editor>Sankoff D, Nadeau J</editor>
            <pubdate>2000</pubdate>
            <fpage>99</fpage>
            <lpage>121</lpage>
         </bibl>
         <bibl id="B63">
            <title>
               <p>FlyBase</p>
            </title>
            <url>http://flybase.org</url>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Assembly/Alignment/Annotation of 12 related <it>Drosophila </it>Species</p>
            </title>
            <url>http://rana.lbl.gov/drosophila</url>
         </bibl>
         <bibl id="B65">
            <title>
               <p>The genome sequence of the malaria mosquito <it>Anopheles gambiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Charlab</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nusskern</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Ribeiro</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Wides</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>129</fpage>
            <lpage>149</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12364791</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Ensmbl: Anopheles Gambiae version AgamP3</p>
            </title>
            <url>http://www.ensembl.org/Anopheles_gambiae</url>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Genome sequence of <it>Aedes Aegypti</it>, a major arbovirus vector.</p>
            </title>
            <aug>
               <au>
                  <snm>Nene</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lawson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kodira</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Loftus</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Xi</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Megy</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Grabherr</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2007</pubdate>
            <volume>316</volume>
            <fpage>1718</fpage>
            <lpage>1723</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17510324</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Ensmbl: Aedes Aegypti version AaegL1</p>
            </title>
            <url>http://www.ensembl.org/Aedes_aegypti/index.html</url>
         </bibl>
         <bibl id="B69">
            <title>
               <p>BCM: <it>Apis mellifera </it>VFersion 4.0</p>
            </title>
            <url>ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Amellifera</url>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Insights into social insects from the genome of the honeybee <it>Apis mellifera</it>.</p>
            </title>
            <aug>
               <au>
                  <cnm>Honeybee Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <fpage>931</fpage>
            <lpage>949</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2048586</pubid>
                  <pubid idtype="pmpid">17073008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>BCM: <it>Tribolium castaneum </it>Release 2</p>
            </title>
            <url>http://www.hgsc.bcm.tmc.edu/projects/tribolium/</url>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Techniques for multi-genome synteny analysis to overcome assembly limitations.</p>
            </title>
            <aug>
               <au>
                  <snm>Bhutkar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Russo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Genome Informatics</source>
            <pubdate>2006</pubdate>
            <volume>17</volume>
            <fpage>152</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubid idtype="pmpid">17503388</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>PHYLIP - Phylogeny Inference Package (Version 3.2).</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cladistics</source>
            <pubdate>1989</pubdate>
            <volume>5</volume>
            <fpage>164</fpage>
            <lpage>166</lpage>
         </bibl>
      </refgrp>
   </bm>
</art>
