Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Research

Conservation of long-range synteny and microsynteny between the genomes of two distantly related nematodes

DB Guiliano1, N Hall2, SJM Jones3, LN Clark2, CH Corton2, BG Barrell2 and ML Blaxter1*

Author Affiliations

1 Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, UK

2 Pathogen Sequencing Unit, The Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

3 Genome Sequence Centre, British Columbia Cancer Research Centre, Vancouver V5Z 4E6, Canada

For all author emails, please log on.

Genome Biology 2002, 3:research0057-research0057.14  doi:10.1186/gb-2002-3-10-research0057


The electronic version of this article is the complete one and can be found online at: http://genomebiology.com/2002/3/10/research/0057


Received:22 May 2002
Revisions received:19 July 2002
Accepted:22 August 2002
Published:26 September 2002

© 2002 Guiliano et al., licensee BioMed Central Ltd

Abstract

Background

Comparisons between the genomes of the closely related nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal high rates of rearrangement, with a bias towards within-chromosome events. To assess whether this pattern is true of nematodes in general, we have used genome sequence to compare two nematode species that last shared a common ancestor approximately 300 million years ago: the model C. elegans and the filarial parasite Brugia malayi.

Results

An 83 kb region flanking the gene for Bm-mif-1 (macrophage migration inhibitory factor, a B. malayi homolog of a human cytokine) was sequenced. When compared to the complete genome of C. elegans, evidence for conservation of long-range synteny and microsynteny was found. Potential C. elegans orthologs for II of the 12 protein-coding genes predicted in the B. malayi sequence were identified. Ten of these orthologs were located on chromosome I, with eight clustered in a 2.3 Mb region. While several, relatively local, intrachromosomal rearrangements have occurred, the order, composition, and configuration of two gene clusters, each containing three genes, was conserved. Comparison of B. malayi BAC-end genome survey sequence to C. elegans also revealed a bias towards intrachromosome rearrangements.

Conclusions

We suggest that intrachromosomal rearrangement is a major force driving chromosomal organization in nematodes, but is constrained by the interdigitation of functional elements of neighboring genes.

Background

All genomes encode conserved genes. The arrangement of these genes on chromosomal elements is determined by a balance between stochastic rearrangements and functional constraints. The level of conservation of gene order (synteny) and linkage between two genomes will depend on the relative contributions of inter- and intrachromosomal rearrangements. Whereas shared ancestry and functional constraints will increase conservation of linkage and synteny between taxa, rearrangement events will tend to randomize gene order over time. In the Metazoa, several gene clusters have been identified that remain linked because of functional constraints. These include the histone genes [1], the Hox gene clusters [2], the immunoglobulin cluster [3], and the major histocompatibility complex (MHC) [4], but most genes are believed to be free to move within the genome. The tempo of gene rearrangement varies between taxa [5,6]. Vertebrate chromosomes are mosaic structures containing large conserved segments that can reside in different linkage groups in different species. There is a surprising conservation of synteny between distantly related species (approximately 450 million years (Myr) divergence) [7]. However, some lineages, such as rodents, show more extensive rearrangement than others, such as teleosts.

In protostomes, comparative studies of the genomes of closely related dipterans (Drosophila sp. and Aedes aegypti [5,8]) and nematodes (Caenorhabditis elegans and C. briggsae [6,9]) revealed a high rate of rearrangement. Chromosome rearrangements between closely related Drosophila species are mainly large pericentric inversions that may be facilitated by flanking transposon sequences [10,11]. C. elegans and C. briggsae are closely related, with estimates of 25-120 Myr divergence based on sequence comparisons [6,12]. Two groups have attempted to assess genome rearrangement rates and modes in comparisons between these two species. Kent and Zahler [9] compared 8.1 megabases (Mb) of fragmentary C. briggsae sequence derived from sequenced cosmid clones to C. elegans and derived a mean syntenic fragment length of 8.6 klobases (kb), or approximately 1.8 genes (there is one gene per 5 kb in C. elegans) [13]. In contrast, Coghlan and Wolfe [6], comparing 12.9 Mb of C. briggsae cosmid-derived sequence, found a mean syntenic fragment length of 53 kb. The difference appears to be purely methodological, as Kent and Zahler analyzed a subset of the data of Coghlan and Wolfe, and probably derives from a more relaxed definition of matching genes and use of cosmid fingerprinting physical map information by the latter study [6]. Estimation of rates of intrachromosomal to between-chromosome rearrangements showed that both were very frequent (approximately fourfold greater than that observed in D. melanogaster). Again, repeat sequences were associated with rearrangement boundaries [6]. It remains to be established whether this high rate of rearrangement is peculiar to the Caenorhabditis lineage, or is a general feature of nematode genomes.

To address this question we have begun analysis of a third nematode genome, that of the human filarial parasite Brugia malayi, which is estimated to have last shared a common ancestor with C. elegans 300-500 Myr ago [14]. B. malayi has a genome size of 100 Mb [15] and a gene complement estimated to be similar to C. elegans [16], and is the subject of a mature, expressed sequence tag (EST)-based genome project [16,17]. Unlike C. elegans, which has five autosomes and an XX/Xo sex-determination system [18], B. malayi has four autosomes and an XX/XY system [19]. The small size of condensed nematode chromosomes has precluded accurate in situ analysis of conservation of gene order. We have therefore taken a sequence-based approach, and here compare an 83 kb region surrounding the B. malayi macrophage-migration-inhibitory factor 1 locus (Bm-mif-1), a B. malayi homolog of a vertebrate cytokine [20], to the C. elegans genome and have found evidence for conservation of linkage and microsynteny between these two distantly related nematodes. The general features of this comparison were confirmed using a survey of genome sequences from B. malayi.

Results

General sequence features of an 83 kb segment of the B. malayi genome

Two overlapping bacterial artificial chromosome clones (BACs) were isolated that spanned the Bm-mif-1 locus. The inserts of BMBAC01L03 and BMBAC01P19 were 28,757 base pairs (bp) and 64,685 bp, respectively, with 10,637 bp of overlap, yielding a contiguated sequence of 82,805 bp (Figure 1). AT content overall was 68.0%; exonic DNA had an AT content of 59.9% and intergenic and intronic DNA had AT contents of 69.3% and 70.4% respectively. The average predicted gene size was 4.7 kb (range 0.6-20 kb). The average distance between genes was 3.1 kb (range 0.3-10.5 kb), giving an average gene density of one gene per 6.9 kb. There was an average of 9.3 introns per gene, with an average intron length of 316 bp (range 48-2,767 bp). The C. elegans orthologs of the B. malayi genes (see below) had a mean length of 3.2 kb, with an average of 5.5 introns per gene (mean size of 142 bp). The B. malayi genes were longer as a result of increased mean length and number of introns. Comparison to C. elegans presumed orthologs (see below) showed that only 50% of C. elegans introns were conserved in B. malayi (29 of 56 introns), and 25% of B. malayi introns (29 of 107) were conserved in C. elegans (Table 1). Of the 12 predicted B. malayi genes, seven were tested and confirmed by cDNA-PCR, and alternatively spliced transcripts were identified for four. Five of the 12 genes had corresponding ESTs (Table 1).

thumbnailFigure 1. The BMBAC01L03/BMBAC01P19 contig compared to the C. elegans genome. Genes are indicated by exon (box) and intron (bracket) structures. For each species, the direction of transcription of the genes is indicated by an arrow. The C. elegans gene structures are drawn to the same scale as the B. malayi contig. A, Match to B. malayi EST cluster BMC03169 [16]. Brugia EST (BMC) and Onchocerca volvulus (OVC) clusters are viewable in NemBase [39,60]. B, Highly similar to O. volvulus EST cluster OVC02481 [61]. C, Match to B. malayi EST cluster BMC00238. D, Match to B. malayi EST clusters BMC02055 and BMC01932. However, no ORF was identified, and it may not represent protein-coding sequence (see text for discussion). E, Match to B. malayi EST cluster BMC06334. F, Match to B. malayi EST cluster BMC00400. G, BMBAC01L03.1 and BMBAC01P19.7 are gene fragments. Percent identity was calculated on the alignable portion of the C. elegans ortholog. H, F13G3.9 (Ce-mif-3) is on C. elegans chromosome I. However, F13G3.9 is not the predicted ortholog of Bm-mif-1 and thus the relationship is indicated by a dashed arrow (see text). I, Percent identity was calculated for BMBAC01P19.3 and BMBAC01L03.4 only within the PWWP or dnaJ domains respectively. Homolog pairs are indicated by the colouring of the gene models.

Table 1. Genes predicted on the BMBAC01L03/BMBAC01P19 contig

Comparison of predicted genes to C. elegans

All 12 predicted genes had C. elegans homologs, but putative orthology could only be assigned to 11 pairs (Figure 1, Table 1). Orthology definition is possibly problematic, as the complete genome sequence of B. malayi is not known, and it is thus possible that genes more similar to these C. elegans comparators could be present. We note, however, that no B. malayi EST-defined genes (23,000 ESTs defining approximately 8,300 genes [16]) have better matches to these C. elegans proteins (data not shown), and that orthology definition included coextension of the proteins, and conservation of intron position and phase (Table 1). The exception, BMBAC01L03.3, contained two domains, an amino-terminal LON ATP-dependent serine protease domain (domain PF02190) and an anonymous carboxy-terminal domain (PFB022940). Proteins predicted from the Arabidopsis thaliana (AAC42255.1), Mus musculus (NP_067424), and Homo sapiens (XP_0421219) genomes share this architecture, but there are no C. elegans proteins that have both domains.

Some genes were similar to hypothetical, functionally uncharacterized genes from C. elegans. BMBAC01P19.7a/b had multiple predicted transmembrane segments also found in a number of peptides from other species (PFB002843) and were most similar to C36B1.12 (60% identity). There is only one homolog of BMBAC01P19.3a in any organism -F43G9.4 from C. elegans. The amino termini of both BMBAC01P19.3a and F43G9.4 contained PWWP domains (PF00855). PWWP domains are found in proteins with nuclear location and roles in cell growth and differentiation [21,22]. PSORT profiling indicated that BMBAC01P19.3 and F43G9.4 were likely to have nuclear localizations. The amino terminus of BMBAC01L03.4 contains a dnaJ-like domain (PF00684). The dnaJ domain is found in 41 C. elegans proteins, but BMBAC01L03.4 showed highest identity (57%) to F39B2.10. Both proteins had the dnaJ domain at their amino terminus and shared a common position of the first intron in this region. The remainder of the protein was not conserved.

BMBAC01P19.1 encodes Bm-mif-1 (Figure 2) [20]. Mammalian MIF is a cytokine involved in inflammation, growth, and differentiation of immune cells [23]: B. malayi MIF-1 may have a role in immunomodulation of the host [20,24]. C. elegans has four MIF-like genes: Ce-mif-1 (Y56A3A.3), Ce-mif-2 (C52E4.2), Ce-mif-3 (F13G3.9), and Ce-mif-4 (Y73B6BL.13). Transgenic reporter and immunolocalization studies suggest that C. elegans MIFs may have roles in development and the dauer stage [13,25]. Bm-MIF-1 has highest pairwise similarity to Ce-MIF-1 (41% compared to 23-29% for the other three paralogues; Figure 2) [20], and phylogenetic analysis of over seventy MIF-like proteins from eukaryotes confirms this assignment (D.B.G. and M.L.B., manuscript in preparation). Comparison of Bm-MIF-1 to the C. elegans MIFs, a second B. malayi MIF (Bm-MIF-2), and human MIF-1 (Figure 2) revealed that Bm-mif-1 and Ce-mif-1 shared two intron/exon boundaries also found in vertebrate MIFs. One of these introns was also present in Ce-mif-3, but Ce-mif-3 and the other two C. elegans mif genes shared a set of introns not present in the mif-1 genes. Bm-MIF-1 and other filarial MIF-1 homologs contain a CXXC motif (single-letter amino-acid code) critical for the thiol-oxidoreductase activities of vertebrate MIF [26]. None of the C. elegans MIF homologs contained this motif.

thumbnailFigure 2. Comparison of B. malayi and C. elegans MIF proteins. Bm-MIF-1 (accession AAC82502) was aligned with human Hs-MIF-1(AAA21814), C. elegans MIF homologs Ce-MIF-1 (CAB60512), Ce-MIF-2 (CAB01412), Ce-MIF-3 (CAA95795), Ce-MIF-4 (AAG23475), and Bm-MIF-2b (AAF91074). Intron positions are marked by triangles (red, conserved with Hs-MIF-1; blue, Ce-MIF-2, -3 and -4 specific). The proline at position 2 (white) is important for immune function, and the CXXC motif at positions 60-63 is essential for thiol-oxidoreductase activity in mammalian MIF. The percent identity of each protein to Bm-MIF-1 is given at the end of the alignment.

Conserved gene clusters

Two clusters of three genes in close proximity are conserved. The first involves BMBAC01L03.2, .3 and .5. The C. elegans orthologs of these genes are F43G9.5, F43G9.4, and F43G9.3 respectively. F43G9.5 and F43G9.3 are divergently transcribed from a 631 bp intergenic region. F43G9.3 is followed by F43G9.4 in the same transcriptional orientation with 501 bp separating the genes. In B. malayi this local synteny is conserved, except that two additional genes - BMBAC01L03.3 and .4 - are found between BMBAC01L03.2 and .5.

The second cluster also involves three genes. Proteins predicted from both alternative transcripts of BMBAC01P19.2 were found to be homologous to large proteins from Homo sapiens (BAF180, AAG34760 [27]), Gallus gallus (JC5056 [28]), D. melanogaster (CG11375, AAF56339), and C. elegans (C26C6.1) (Figure 3). These proteins shared six bromodomains (PF00439), two BAH domains (bromo-adjacent homology, PF01426), a HMG box (high mobility group, PF00505), and an anonymous carboxy-terminal domain (PFB007669). The B. malayi, C. elegans, and D. melanogaster polybromodomain (PBR) proteins also contain two C2H2 zinc fingers. PBR proteins may be involved in chromatin-remodeling complexes. Bromodomains interact with acetylated lysine in histone complexes, while HMG boxes are found in chromatin proteins that bind to single-stranded DNA and unwind double-stranded DNA. Human BAF180 has been shown to localize to the kinetochores of mitotic chromosomes [27]. None of the vertebrate PBR homologs contains zinc fingers, which may indicate additional functions for the nematode and fly proteins.

thumbnailFigure 3. The pbr synteny cluster and pbr homologs in other species. The genomic organization of the pbr synteny cluster in C. elegans and B. malayi, and the domain structure of the PBR homologs in Drosophila melanogaster, Gallus gallus, and Homo sapiens are illustrated. Intron/exon boundaries that are conserved between the nematodes are indicated by asterisks. White boxes represent the contiguous DNA underlying the gene models.

Two conserved genes were identified immediately upstream from pbr-1 (Figure 3). BMBAC01P19.5 (named Bm-ubr-1 (upstream of pbr-1)) showed significant similarity only to T28F4.4 from C. elegans (27% identity). The protein encoded by BMBAC01P19.4 is homologous to C. elegans T28F4.5 (30% identity). Iterative searches of GenBank using PSI-BLAST [29] indicated that BMBAC01P19.4 and T28F4.5 belong to a group of small peptides that include human DAP-1 (death-associated protein). DAP-1 is a nuclear protein and positive regulator of interferon gamma-induced apoptosis in HeLa cells [30]. PSORT profiling indicated that both nematode proteins may have a nuclear localization. BMBAC01P19.2 (Bm-pbr-1) and BMBAC01P19.5 (Bm-ubr-1) are divergently transcribed and BMABAC01P19.4 (Bm-dap-1) is found in the large third intron of BMBAC01P19.5 in the same transcriptional orientation as BMBAC01P19.2 (Figure 3). In the C. elegans instance of the PBR cluster, C26C6.1 (Ce-pbr-1) and T28F4.4 (Ce-ubr-1) are also divergently transcribed from a 1,233 bp intergenic region. The third gene, T28F4.5 (Ce-dap-1) is found in the large third intron of T28F4.4 on the same strand as C26C6.1.

Comparison of the intergenic and upstream regions of both clusters, and of the orthologous gene pairs, did not reveal any clear motifs that might be involved in transcriptional regulation. In particular, the intergenic DNA between pbr-1 and ubr-1, and the first intron of ubr-1, had less than 30% pairwise identity throughout, and there were no stretches of greater identity. The AT richness of the B. malayi genome compared to C. elegans may obscure any conserved elements. No RNA-coding genes were found. Two B. malayi ESTs matched at > 99.5% identity to two regions of BMBAC01P19 separated by 200 bp that were not predicted to be part of a transcript (see Figure 1). These regions are downstream of gene BMBAC01P19.3, and may derive from alternative 3' untranslated regions: the furthest downstream match includes a good polyadenylation site. The 3' end of the cDNA determined for this gene may have derived from internal priming from an A-rich segment of the 3' untranslated region.

Fractured synteny between the genomes of B. malayi and C. elegans

All of the C. elegans orthologs, except for Y56A3A.3 (Ce-mif-1, 41% identity to Bm-mif-1, on chromosome III), are located on chromosome I (Figure 4). F13G3.9 (Ce-mif-3, 23% identity to Bm-mif-1) is found on C. elegans chromosome I in close proximity to the orthologs of B. malayi genes BMBAC01P19.2, .4, and .5. This could suggest that our orthology assignment is wrong. As described above, however, Ce-mif-1 and Bm-mif-1 share two intron positions and are more similar to each other than either is to Ce-mif-3, which has one concordant intron position, and one discordant intron position. The conflict between location and structure could be due to a gene-conversion event in either lineage, or an event of directed movement or insertion.

thumbnailFigure 4. Comparison of linkage and synteny with C. elegans. The B. malayi contig is compared to an approximately 9 Mb segment of C. elegans chromosome I. The relative positions of the ortholog pairs, colored as in Figure 1, are indicated. The link between Bm-mif-1 and Ce-mif-3 (F13G3.9) is dashed to indicate that these two genes are paralogs rather than orthologs (see text for details).

Eight of the 10 remaining C. elegans orthologs lay within a 2.3 Mb region in the center of chromosome I (6.7-9 Mb) (Figure 4). The orthologs of the other two genes (BMBACoLo3.4 and BMBAC01P19.6) are found at the distal tip of chromosome I. While there has been extensive rearrangement of gene order, when compared to the C. elegans orthologs, 10 of the B. malayi genes were in the same relative transcriptional orientation. Examination of the boundaries of the C. elegans cluster and individual gene regions did not show any association with repeat-sequence classes, including those shown to be commonly associated with rearrangements between C. elegans and C. briggsae [6].

Genome survey sequence comparison and synteny

To ascertain whether the segment sequenced was representative of the relationship between the B. malayi genome and that of C. elegans, we surveyed the B. malayi BAC-end derived genome survey sequences (GSSs; J. Daub, C. Whitton, N.H., M. Quail and M.L.B., unpublished observations). There are over 18,000 GSSs from B. malayi, derived from three independent libraries. Each BAC-end sequence was compared to the C. elegans proteome (Wormpep [31]) and significant similarities recorded (BLASTX probabilities < e-8). The chromosomal position of each matching C. elegans protein was derived from Wormbase [32]. One hundred and sixty-four BACs had matches at both ends to C. elegans proteins under these conditions (summarized in Table 2, details in Table 3). We note that these matches are not necessarily to orthologs, as we have not carried out intensive analysis of each one, but random selection of genes should not yield greater linkage estimation despite the problem of gene families and domain matches. While much of the C. elegans proteome consists of protein families, very few of these have a chromosomally restricted distribution [33,34].

Table 2. Synteny conservation between B. malayi BAC-end genome survey sequences and C. elegans genome sequence

Table 3. B. malayi BAC end comparisons to C. elegans

C. elegans has six chromosomes. Under a minimal model, if a genome rearrangement were equally likely to involve a between-chromosome as a within-chromosome event, and was only dependent on the length of DNA in the within-chromosome versus not-within-chromosome classes, we would expect approximately five of every six rearrangements to involve between-chromosome events and one-sixth to involve within-chromosome events. This model ignores the fact that B. malayi has only five chromosome pairs: four autosomes and one XY pair. The derivation of the two karyotypes is unknown, and cannot be deduced from phylogenetic comparisons (see [35]). While most nematodes of clade V have six chromosomes like C. elegans, other taxa in the Secernentea have from one to > 100 [36]. If we assume that the C. elegans complement derives from splitting of an ancestral chromosome retained in B. malayi, the expectation would be that 20% of rearrangements would be within-chromosome.

Many more BACs had significantly more ends mapping to the same chromosome than would be expected under these models (approximately 55%, χ2 test p < 0.01 for all comparisons in Table 2 under the above model). The mean distance between the C. elegans matches was 4.4 Mb, which may be compared to an expected approximately 45 kb for the separation between the B. malayi BAC ends.

Discussion

B. malayi is a human parasite only distantly related to the model nematode C. elegans [14,37]; therefore, genome comparisons between these species will yield data concerning longer-term changes in structure and function that cannot be derived from within-genus comparisons. In the 83 kb of genomic DNA flanking the B. malayi mif-1 locus we found a fractured conservation of microsynteny between the two nematode genomes, and conservation of linkage. Twelve protein-coding genes were predicted, and 11 of these had putative orthologs in the C. elegans genome. Ten of these orthologs were on C. elegans chromosome I, with eight in a 2.3 Mb segment in the center of the chromosome and two at the distal tip of chromosome I. Some of these genes have remained tightly linked in the same or slightly modified relative transcriptional orientations in both species.

This pattern, of conservation of linkage with disruption of precise synteny, was confirmed using BAC-end sequences. Of the 171 clones with matches at both ends to C. elegans genes, over 55% were localized to the same chromosome in C. elegans. While the mean distance separating the B. malayi genes is 45 kb (the length of the BAC clones; [38] and C. Whitton and M.L.B., unpublished work), the mean distance between the matching C. elegans genes is approximately 4.4 Mb.

The 83 kb fragment of B. malayi genomic DNA is the largest contiguated portion of sequenced genomic DNA from a non-rhabditid nematode described to date. A large proportion (around 60%) of genes identified in the B. malayi EST dataset (23,000 ESTs corresponding to around 8,300 unique transcripts [39]) have no close C. elegans homologue [16]. In this study, however, C. elegans orthologs were identified for 11 of the 12 identified B. malayi genes. Some of these orthologous pairs were confirmed by congruence in length of open reading frame and shared intron positions, despite low pairwise identity. Global searches with ESTs would not have detected these pairs (BLAST probability values of approximately e-4), and thus the true proportion of B. malayi unique genes is likely to be less than 60%. B. malayi genes were found to have larger and more numerous introns than C. elegans genes (2.2 times longer and 1.7 times more frequent), in keeping with previous estimates made using data from several highly expressed genes [40]. If the contig is representative and gene complement is equivalent to C. elegans, the B. malayi genome may be larger (120-140 Mb) than estimated previously (100 Mb [41]). Four of seven genes confirmed by reverse transcriptase PCR had alternative transcripts, a figure consistent with C. elegans EST and cDNA projects [42]. Additionally, five genes had B. malayi EST matches, a proportion congruent with the estimate that the EST program has identified around 40% of the expected 20,000 B. malayi genes [16].

Conserved linkage between the genomes of closely related eukaryotic organisms has been shown in several taxa. But it is only recently, with the sequencing of discrete segments or whole genomes, that examples of conservation of microsynteny between the genomes of distantly related species (not involving functionally related genes) have been described [43,44]. The microsyntenic gene clusters retained between C. elegans and B. malayi do not fall into any clear functional categories. However, all genes contained in the second cluster (BMBAC01P19.2, .4, and .5) are predicted to have nuclear localization signals and could be co-regulated. Alternatively, promoters or cis-acting regulatory elements required for their proper function could be embedded within other cluster members. Interdigitation of these regulatory elements could be constraining the movement of genes away from this cluster. No conserved motifs were found, however, and this possibility can thus only be tested by transgenesis experiments. This phenomenon has been observed in other systems such as fungal genomes, where gene pairs predicted to have overlapping regulatory elements are more likely to be conserved between species [45].

Many genes in C. elegans are co-transcribed in operons [46,47] and this could constrain synteny breakage. The C. elegans orthologs of BMBAC01L03.5 and BMBAC01P19.3 are separated by 501 bp, an intergenic distance found in other C. elegans operons, and the downstream gene (Ce-F43G9.4) was shown to be trans-spliced to the SL2 spliced leader, a feature of downstream genes in C. elegans operons [47]. However, in B. malayi, BMBAC01L03.5 and BMBAC01P19.3 are separated by 2.8 kb, which is outside the range of operon intergenic spacing. The functions of C. elegans genes on chromosome I have been investigated by RNA-mediated interference and a phenotype was identified for one gene in each cluster: embryonic lethality (F39G4.5 [48]) and altered adult morphology (C26C6.1 [49]). Therefore, it is possible that the clusters are conserved because removing other members would interfere with functions of these essential genes. The one exception to the conservation of linkage is the Bm-mif-1/Ce-mif-1 ortholog pair. Another C. elegans MIF homolog, Ce-mif-3, is found in close proximity to the genes in the pbr-1 synteny cluster, raising the possibility that a gene-conversion event may have obscured orthology assignment for this gene.

In the Metazoa, long-range synteny between the genomes of distantly related species (>300 Myr divergence) has only been identified previously in vertebrates (teleost fish and humans [50,51]). In vertebrates, interchromosomal exchanges seem to be rare events, and some linkage groups, such as human chromosomes 6 and X, are conserved across most eutherian mammals [7]. From the analyses presented here we can suggest some general patterns of gene rearrangement in nematodes. Most of the C. elegans orthologs were located in a small segment of chromosome I (nine of eleven genes in 2.3 Mb or 16% of the chromosome), suggesting that local intrachromosomal inversions or rearrangements have occurred more frequently than long-range intrachromosomal, or interchromosomal rearrangements. This is consistent with patterns observed in closely related dipterans, where the composition of linkage groups is conserved but not the order within the chromosome. Mechanistically this may occur because intrachromosomal rearrangements require fewer DNA breaks than interchromosomal translocations, and the nuclear scaffold may hold local chromosomal regions in closer association. The high rate of rearrangement of genes within the nematode chromosomes makes it unlikely that the positional information of genes in the Caenorhabditis genomes will be useful in finding orthologous genes in the genomes of distantly related nematodes such as B. malayi.

Materials and methods

Identification of candidate genomic clones for sequencing

A probe for Bm-mif-1 was synthesized by labeling full-length cDNA (GenBank accession U88035) with biotin (Phototope; New England Biolabs), hybridized to high-density arrays of 18,000 BAC clones containing B. malayi genomic DNA [52], and detected with the Phototope detection kit (New England Biolabs). Hybridization-positive BACs were PCR verified using gene-specific primers Bm-MIF-1.F1a (ATGCCATATTTTACGATTGATAC) and Bm-MIF-1.R1a (GAACACCATCGCTTGTCCACC) using standard reaction and cycling conditions (0.2 mM dNTPs, 1.5 mM MgCl, 0.5 pM primer; 1 cycle of 94°C for 3 min; 35 cycles of 94°C for 15 sec, 55°C for 20 sec, 72°C for 3 min; 1 cycle of 72°C for 10 min). BMBAC01P19 was selected for sequencing. Sequence from the T7 end of the insert was used to design specific primers 01P19.T7.F1 (GCAGCAAATGCTTATTTGTCTTG) and 01P19.T7.R1 (GTTTGGTGATTCATGTCCATGAGC). Primers 01P19.T7.R1 and 2BiotinBACF3 (designed to the BAC vector; (biotinU)2GAGTCGACCTGCAGGCATGC; New England BioLabs Organic Synthesis Unit) were used to synthesize a biotin-labeled end probe. The probe was hybridized to the BAC library filter using a modified hybridization and detection protocol [38]. Positive BACs were PCR verified with primers 01P19.T7.R1 and 01P19.T7.F1, and insert DNA prepared using a kit (Qiagen). BAC ends were end-sequenced using the Sanger Institute protocol [53]. BMBAC01L03 showed minimal overlap with BMBAC01P19 compared to other clones and was selected for sequencing.

Preparation, subcloning, and sequencing of BACs

The BACs were sequenced using a standard two-stage strategy involving random sequencing of subcloned DNA followed by directed sequencing to resolve problem areas. In the first stage, DNA prepared from BAC clones was shattered by sonification and fragments of 1.4-2 kb cloned into pUC18. DNA from randomly selected clones was sequenced with dye-terminator chemistry and analyzed on automated sequencers. Each BAC was sequenced to a depth of sevenfold coverage. Contigs were assembled using phrap (Phil Green, Washington University Genome Sequencing Center, unpublished). Manual base calling and finishing was carried out using Gap4 [54]. Gaps and low-quality regions were resolved by techniques such as primer walking, PCR and resequencing clones under conditions that give increased read lengths.

Sequence analysis

The finished sequences of BMBAC01P19 and BMBAC01L03 were compared to the GenBank nonredundant (nucleic acid and protein) EST database (dbEST), the C. elegans genome and protein and the custom B. malayi clustered EST [16] databases using BLAST [55,56]. GeneFinder (P. Green and L. Hillier, Washington University Genome Sequencing Center, unpublished) was trained with 162 publicly available B. malayi gene sequences and used to analyze the contiguated sequence. The sequence was annotated on the Artemis workbench [57]. Predicted protein sequences were compared to Pfam [58] and cellular localization examined using PSORTII [59]. The annotated sequence is available in GenBank (accession AL606837).

Verification of gene predictions

To confirm gene predictions from BMBAC01P19, primers were designed and PCR was carried out on oligo(dT)-primed B. malayi mixed adult first-strand cDNA with gene-specific primers. To isolate cDNA ends, the GeneRacer 3' RACE primer (Invitrogen) (GCTGTCAACGATACGCTACGTAACGGCATGACAGTG), or the nematode SL1 sequence (GGTTTAATTACCCAAGTTTGAG) were used with specific primers. Secondary PCRs were carried out using nested primers and 2% of the primary PCR product. Positive PCR products were cloned and sequenced.

BAC-end sequence analysis

The B. malayi BAC-end sequence dataset was compared to the C. elegans proteome in Wormpep. Significant matches were filtered, and BAC clones having matches on both ends retained. The chromosomal position of the C. elegans genes was determined from [32].

Acknowledgements

We thank the Filarial Genome Project for the B. malayi BAC library and clones, Yvonne Harcus, Janice Murray, William Gregory, and Rick Maizels for B. malayi materials, Jen Daub and Claire Whitton for BAC-end analysis, Dan Lawson for help with C. elegans genome queries, and New England BioLabs for reagents. Funding for this work was provided by the Medical Research Council. We acknowledge the support and hard work of sequencing team 14 at the Sanger Institute.

References

  1. Hentschel CC, Birnstiel ML: The organization and expression of histone gene families.

    Cell 1981, 25:301-313. PubMed Abstract | Publisher Full Text OpenURL

  2. Ferrier DE, Holland PW: Ancient origin of the Hox gene cluster.

    Nat Rev Genet 2001, 2:33-38. PubMed Abstract | Publisher Full Text OpenURL

  3. Litman GW, Rast JP, Shamblott MJ, Haire RN, Hulst M, Roess W, Litman RT, Hinds-Frey KR, Zilch A, Amemiya CT: Phylogenetic diversification of immunoglobulin genes and the antibody repertoire.

    Mol Biol Evol 1993, 10:60-72. PubMed Abstract | Publisher Full Text OpenURL

  4. Ohta Y, Okamura K, McKinney EC, Bartl S, Hashimoto K, Flajnik MF: Primitive synteny of vertebrate major histocompatibility complex class I and class II genes.

    Proc Natl Acad Sci USA 2000, 97:4712-4717. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Ranz JM, Casals F, Ruiz A: How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus Drosophila.

    Genome Res 2001, 11:230-239. PubMed Abstract | Publisher Full Text OpenURL

  6. Coghlan A, Wolfe KH: Fourfold faster rate of genome rearrangement in nematodes than in Drosophila.

    Genome Res 2002, 12:857-867. PubMed Abstract | Publisher Full Text OpenURL

  7. O'Brien SJ, Menotti-Raymond M, Murphy WJ, Nash WG, Wienberg J, Stanyon R, Copeland NG, Jenkins NA, Womack JE, Marshall Graves JA: The promise of comparative genomics in mammals.

    Science 1999, 286:458-462. PubMed Abstract | Publisher Full Text OpenURL

  8. Fulton RE, Salasek ML, DuTeau NM, Black WCT: SSCP analysis of cDNA markers provides a dense linkage map of the Aedes aegypti genome.

    Genetics 2001, 158:715-726. PubMed Abstract | Publisher Full Text OpenURL

  9. Kent WJ, Zahler AM: Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.

    Genome Res 2000, 10:1115-1125. PubMed Abstract | Publisher Full Text OpenURL

  10. Caceres M, Ranz JM, Barbadilla A, Long M, Ruiz A: Generation of a widespread Drosophila inversion by a transposable element.

    Science 1999, 285:415-418. PubMed Abstract | Publisher Full Text OpenURL

  11. Caceres M, Puig M, Ruiz A: Molecular characterization of two natural hotspots in the Drosophila buzzatii genome induced by transposon insertions.

    Genome Res 2001, 11:1353-1364. PubMed Abstract | Publisher Full Text OpenURL

  12. Thomas WK, Wilson AC: Mode and tempo of molecular evolution in the nematode caenorhabditis: cytochrome oxidase II and calmodulin sequences.

    Genetics 1991, 128:269-279. PubMed Abstract OpenURL

  13. The C. elegans Sequencing Consortium: Genome sequence of the nematode C. elegans: a platform for investigating biology.

    Science 1998, 282:2012-2018. PubMed Abstract | Publisher Full Text OpenURL

  14. Vanfleteren JR, Van de Peer Y, Blaxter ML, Tweedie SA, Trotman C, Lu L, Van Hauwaert ML, Moens L: Molecular genealogy of some nematode taxa as based on cytochrome c and globin amino acid sequences.

    Mol Phylogenet Evol 1994, 3:92-101. PubMed Abstract | Publisher Full Text OpenURL

  15. McReynolds LA, DeSimone SM, Williams SA: Cloning and comparison of repeated DNA sequences from the human filarial parasite Brugia malayi and the animal parasite Brugia pahangi.

    Proc Natl Acad Sci USA 1986, 83:797-801. PubMed Abstract OpenURL

  16. Blaxter M, Daub J, Guiliano DB, Parkinson J, Whitton C, Project FG: The Brugia malayi genome project: expressed sequence tags and gene discovery.

    Trans Roy Soc Trop Med Hyg 2001, in press. OpenURL

  17. Williams SA, Lizotte-Waniewski MR, Foster J, Guiliano D, Daub J, Scott AL, Slatko B, Blaxter ML: The filarial genome project: analysis of the nuclear, mitochondrial and endosymbiont genomes of Brugia malayi.

    Int J Parasitol 2000, 30:411-419. PubMed Abstract | Publisher Full Text OpenURL

  18. Meyer BJ: Sex determination and X chromosome dosage compensation. In In C. elegans II. Edited by Riddle DL, Blumenthal T, Meyer BJ, Priess JR. Plainview, New York: Cold Spring Harbor Laboratory Press; 1997:209-240. OpenURL

  19. Sakaguchi Y, Tada I, Ash LR, Aoki Y: Karyotypes of Brugia pahangi and Brugia malayi (Nematoda: Filaroidea).

    J Parasitol 1983, 69:1090-1093. PubMed Abstract OpenURL

  20. Pastrana DV, Raghavan N, FitzGerald P, Eisinger SW, Metz C, Bucala R, Schleimer RP, Bickel C, Scott AL: Filarial nematode parasites secrete a homologue of the human cytokine macrophage migration inhibitory factor.

    Infect Immun 1998, 66:5955-5963. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  21. Stec I, Wright TJ, van Ommen GJ, de Boer PA, van Haeringen A, Moorman AF, Altherr MR, den Dunnen JT: WHSC1, a 90 kb SET domain-containing gene, expressed in early development and homologous to a Drosophila dysmorphy gene maps in the Wolf-Hirschhorn syndrome critical region and is fused to IgH in t(4;14) multiple myeloma.

    Hum Mol Genet 1998, 7:1071-1082. PubMed Abstract | Publisher Full Text OpenURL

  22. Stec I, Nagl SB, van Ommen GJ, den Dunnen JT: The PWWP domain: a potential protein-protein interaction domain in nuclear proteins influencing differentiation?

    FEBS Lett 2000, 473:1-5. PubMed Abstract | Publisher Full Text OpenURL

  23. Nishihira J: Macrophage migration inhibitory factor (MIF): its essential role in the immune system and cell growth.

    J Interferon Cytokine Res 2000, 20:751-762. PubMed Abstract | Publisher Full Text OpenURL

  24. Falcone FH, Loke P, Zang X, MacDonald AS, Maizels RM, Allen JE: A Brugia malayi homolog of macrophage migration inhibitory factor reveals an important link between macrophages and eosinophil recruitment during nematode infection.

    J Immunol 2001, 167:5348-5354. PubMed Abstract | Publisher Full Text OpenURL

  25. Marson AL, Tarr DEK, Scott AL: Macrophage migration inhibitory (mif) transcription is significantly elevated in Caenorhabditis elegans dauer larvae.

    Gene 2001, 278:53-62. PubMed Abstract | Publisher Full Text OpenURL

  26. Kleemann R, Rorsman H, Rosengren E, Mischke R, Mai NT, Bernhagen J: Dissection of the enzymatic and immunologic functions of macrophage migration inhibitory factor. Full immunologic activity of N-terminally truncated mutants.

    Eur J Biochem 2000, 267:7183-7193. PubMed Abstract | Publisher Full Text OpenURL

  27. Xue Y, Canman JC, Lee CS, Nie Z, Yang D, Moreno GT, Young MK, Salmon ED, Wang W: The human SWI/SNF-B chromatin-remodeling complex is related to yeast rsc and localizes at kinetochores of mitotic chromosomes.

    Proc Natl Acad Sci USA 2000, 97:13015-13020. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  28. Nicolas RH, Goodwin GH: Molecular cloning of polybromo, a nuclear protein containing multiple domains including five bromodomains, a truncated HMG-box, and two repeats of a novel domain.

    Gene 1996, 175:233-240. PubMed Abstract | Publisher Full Text OpenURL

  29. Altschul SF, Koonin EV: Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases.

    Trends Biochem Sci 1998, 23:444-447. PubMed Abstract | Publisher Full Text OpenURL

  30. Deiss LP, Feinstein E, Berissi H, Cohen O, Kimchi A: Identification of a novel serine/threonine kinase and a novel 15-kD protein as potential mediators of the gamma interferon-induced cell death.

    Genes Dev 1995, 9:15-30. PubMed Abstract OpenURL

  31. Wormpep [http://www.sanger.ac.uk/Projects/C_elegans/wormpep] webcite

  32. WormBase [http://www.wormbase.org/] webcite

  33. Friedman R, Hughes AL: Gene duplication and the structure of eukaryotic genomes.

    Genome Res 2001, 11:373-381. PubMed Abstract | Publisher Full Text OpenURL

  34. Lespinet O, Wolf YI, Koonin EV, Aravind L: The role of lineage-specific gene family expansion in the evolution of eukaryotes.

    Genome Res 2002, 12:1048-1059. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  35. Blaxter ML: Genes and genomes of Necator americanus and related hookworms.

    Int J Parasitol 2000, 30:347-355. PubMed Abstract | Publisher Full Text OpenURL

  36. Walton AC: Some parasites and their chromosomes.

    J Parasitol 1959, 45:1-20. PubMed Abstract OpenURL

  37. Blaxter ML, De Ley P, Garey JR, Liu LX, Scheldeman P, Vierstraete A, Vanfleteren JR, Mackey LY, Dorris M, Frisse LM, et al.: A molecular evolutionary framework for the phylum Nematoda.

    Nature 1998, 392:71-75. PubMed Abstract | Publisher Full Text OpenURL

  38. Foster JM, Kamal IH, Daub J, Swan MC, Ingram JR, Ganatra M, Ware J, Guiliano D, Aboobaker A, Moran L, et al.: Hybridization to high-density filter arrays of a Brugia malayi BAC library with biotinylated oligonucleotides and PCR products.

    Biotechniques 2001, 30:1216-1218. PubMed Abstract OpenURL

  39. Parkinson J, Whitton C, Guiliano D, Daub J, Blaxter ML: 200,000 nematode ESTs on the net.

    Trends Parasitol 2001, 17:394-396. PubMed Abstract | Publisher Full Text OpenURL

  40. Zang X, Yazdanbakhsh M, Jiang H, Kanost MR, Maizels RM: A novel serpin expressed by blood-borne microfilariae of the parasitic nematode Brugia malayi inhibits human neutrophil serine proteinases.

    Blood 1999, 94:1418-1428. PubMed Abstract | Publisher Full Text OpenURL

  41. Maina CV, Grandea AG III, Tuyen LTK, Asikin N, Williams SA, McReynolds LA: Dirofilaria immitis: Genomic complexity and characterisation of a structural gene. In In Molecular Paradigms for Eradicating Helminthic Parasites. Edited by MacInnis AJ. Alan Liss: New York; 1987:193-204. OpenURL

  42. Reboul J, Vaglio P, Tzellas N, Thierry-Mieg N, Moore T, Jackson C, Shin IT, Kohara Y, Thierry-Mieg D, Thierry-Mieg J, et al.: Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans.

    Nat Genet 2001, 27:332-336. PubMed Abstract | Publisher Full Text OpenURL

  43. Brunner B, Todt T, Lenzner S, Stout K, Schulz U, Ropers HH, Kalscheuer VM: Genomic structure and comparative analysis of nine Fugu genes: conservation of synteny with human chromosome Xp22.2-p22.1.

    Genome Res 1999, 9:437-448. PubMed Abstract | Publisher Full Text OpenURL

  44. Hamer L, Pan H, Adachi K, Orbach MJ, Page A, Ramamurthy L, Woessner JP: Regions of microsynteny in Magnaporthe grisea and Neurospora crassa.

    Fungal Genet Biol 2001, 33:137-143. PubMed Abstract | Publisher Full Text OpenURL

  45. Huynen MA, Snel B, Bork P: Inversions and the dynamics of eukaryotic gene order.

    Trends Genet 2001, 17:304-306. PubMed Abstract | Publisher Full Text OpenURL

  46. Blumenthal T, Steward K: RNA processing and gene structure. In In C. elegans II. Edited by Riddle DL, Blumenthal T, Meyer BJ, Priess JR. Plainview, New York: Cold Spring Harbor Laboratory Press; 1997:117-146. OpenURL

  47. Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M, Kim SK: A global analysis of Caenorhabditis elegans operons.

    Nature 2002, 417:851-854. PubMed Abstract | Publisher Full Text OpenURL

  48. Maeda I, Kohara Y, Yamamoto M, Sugimoto A: Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi.

    Curr Biol 2001, 11:171-176. PubMed Abstract | Publisher Full Text OpenURL

  49. Fraser AG, Kamath RS, Zipperlen P, Martinez-Campos M, Sohrmann M, Ahringer J: Functional genomic analysis of C. elegans chromosome I by systematic RNA interference.

    Nature 2000, 408:325-330. PubMed Abstract | Publisher Full Text OpenURL

  50. Grant D, Cregan P, Shoemaker RC: Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis.

    Proc Natl Acad Sci USA 2000, 97:4168-4173. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  51. Ku HM, Vision T, Liu J, Tanksley SD: Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny.

    Proc Natl Acad Sci USA 2000, 97:9121-9126. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  52. Guiliano D, Ganatra M, Ware J, Parrot J, Daub J, Moran L, Brennecke H, Foster JM, Supali T, Blaxter M, et al.: Chemiluminescent detection of sequential DNA hybridizations to high- density, filter-arrayed cDNA libraries: a subtraction method for novel gene discovery.

    Biotechniques 1999, 27:146-152. PubMed Abstract OpenURL

  53. End sequencing protocol [http://www.sanger.ac.uk/Teams/Team51/PACBACPrep.shtml] webcite

  54. Organization of the Gap4 manual [http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_1.html] webcite

  55. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool.

    J Mol Biol 1990, 215:403-410. PubMed Abstract | Publisher Full Text OpenURL

  56. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

    Nucleic Acids Res 1997, 25:3389-3402. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  57. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation.

    Bioinformatics 2000, 16:944-945. PubMed Abstract | Publisher Full Text OpenURL

  58. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL: The Pfam protein families database.

    Nucleic Acids Res 2000, 28:263-266. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  59. Nakai K, Kanehisa M: A knowledge base for predicting protein localization sites in eukaryotic cells.

    Genomics 1992, 14:897-911. PubMed Abstract OpenURL

  60. NemBase [http://www.nematodes.org/] webcite

  61. Lizotte-Waniewski M, Tawe W, Guiliano DB, Lu W, Liu J, Williams SA, Lustigman S: Identification of potential vaccine and drug target candidates by expressed sequence tag analysis and immunoscreening of Onchocerca volvulus larval cDNA libraries.

    Infect Immun 2000, 68:3491-3501. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL