<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2004-5-4-r25</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>The regulatory content of intergenic DNA shapes genome architecture</p>
			</title>
			<aug>
				<au id="A1" ca="yes" ce="yes">
					<snm>Nelson</snm>
					<mi>E</mi>
					<fnm>Craig</fnm>
					<insr iid="I1"/>
					<email>craignelson@wisc.edu</email>
				</au>
				<au id="A2" ce="yes">
					<snm>Hersh</snm>
					<mi>M</mi>
					<fnm>Bradley</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A3">
					<snm>Carroll</snm>
					<mi>B</mi>
					<fnm>Sean</fnm>
					<insr iid="I1"/>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Howard Hughes Medical Institute, University of Wisconsin-Madison, 1525 Linden Drive, Madison, WI 53703, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2004</pubdate>
			<volume>5</volume>
			<issue>4</issue>
			<fpage>R25</fpage>
			<url>http://genomebiology.com/2004/5/4/R25</url>
			<xrefbib>
				<pubid idtype="pmpid">15059258</pubid>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>3</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>9</day>
					<month>1</month>
					<year>2004</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>8</day>
					<month>2</month>
					<year>2004</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>15</day>
					<month>3</month>
					<year>2004</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2004</year>
			<collab>Nelson et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
		</cpyrt>
		<shorttitle>
			<p>The regulatory content of intergenic DNA shapes genome architecture</p>
		</shorttitle>
		<shortabs>
			<p>The relationship between regulatory complexity and gene spacing was examined in <it>Caenorhabditis elegans </it>and <it>Drosophila melanogaster</it>. Intergenic distance, and hence genome architecture, is shaped by regulatory information contained in noncoding DNA.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Factors affecting the organization and spacing of functionally unrelated genes in metazoan genomes are not well understood. Because of the vast size of a typical metazoan genome compared to known regulatory and protein-coding regions, functional DNA is generally considered to have a negligible impact on gene spacing and genome organization. In particular, it has been impossible to estimate the global impact, if any, of regulatory elements on genome architecture.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>To investigate this, we examined the relationship between regulatory complexity and gene spacing in <it>Caenorhabditis elegans </it>and <it>Drosophila melanogaster</it>. We found that gene density directly reflects local regulatory complexity, such that the amount of noncoding DNA between a gene and its nearest neighbors correlates positively with that gene's regulatory complexity. Genes with complex functions are flanked by significantly more noncoding DNA than genes with simple or housekeeping functions. Genes of low regulatory complexity are associated with approximately the same amount of noncoding DNA in <it>D. melanogaster </it>and <it>C. elegans</it>, while loci of high regulatory complexity are significantly larger in the more complex animal. Complex genes in <it>C. elegans </it>have larger 5' than 3' noncoding intervals, whereas those in <it>D. melanogaster </it>have roughly equivalent 5' and 3' noncoding intervals.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>Intergenic distance, and hence genome architecture, is highly nonrandom. Rather, it is shaped by regulatory information contained in noncoding DNA. Our findings suggest that in compact genomes, the species-specific loss of nonfunctional DNA reveals a landscape of regulatory information by leaving a profile of functional DNA in its wake.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010016">Molecular biology</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Many basic issues regarding the organization of regulatory DNA remain unresolved. We do not know the portion of any genome comprising regulatory DNA. We do not understand the factors that govern the size, distance and orientation of regulatory elements relative to coding regions. Nor do we usually know the identity of the many transcription factors that bind any given element. For these reasons, it has been difficult to assess the impact of regulatory DNA on metazoan genome architecture.</p>
			<p>Nevertheless, it is clear that metazoan genomes are not completely random assortments of genic and non-genic sequence. Genomes possess higher-order physical organization, including structural motifs such as centromeres and telomeres, reasonably distinct domains of heterochromatin and euchromatin <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, and less well-defined regions with biased base composition, such as isochores <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Various functional states have been correlated with these organizational groupings. GC-rich isochores, for instance, are relatively gene dense <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, and genes within these isochores tend to be more highly transcribed <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> than genes in less GC-rich regions of the genome.</p>
			<p>Metazoan genomes also contain physical clusters of co-regulated genes. Highly conserved, tightly regulated clusters include the Hox genes, which specify anterior-posterior pattern in all bilaterians <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Other clusters that are more loosely arranged include human housekeeping genes <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, testis-specific genes in <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, and muscle-specific genes in <it>Caenorhabditis elegans </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. These observations suggest that the typical metazoan genome has more fine-scale architecture than is readily apparent. However, the vast majority of metazoan genes are not located in any known cluster and so it remains unclear whether or how these genes are organized. Furthermore, the majority of coexpressed clusters identified in <it>D. melanogaster </it>do not share common functional annotations, suggesting that the apparent coexpression of physically clustered genes may be the result of increased local accessibility of promoters in opened chromatin, rather than explicit regulatory similarity <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
			<p>Despite sharing structural and organizational features, metazoan genomes vary in total size (C value) across several orders of magnitude <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Several explanations for this variation have been proposed. Noncoding, repetitive DNA elements, such as transposons, satellites and simple sequence repeats, can account for some fraction of genome size difference <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. An extension of this model suggests that genome size is determined by the balance between insertions, such as rare bouts of invasion by self-replicating elements, and deletions of nonfunctional DNA from the genome <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Such mutational models of genome size can be contrasted to adaptive models, which suggest that selective constraints act on overall genome size, largely independent of any specific informational content of the DNA. For example, genome size and cell size are significantly correlated <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. This correlation may influence the developmental rate and developmental complexity of an organism and thereby exert selective pressure on overall genome size <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
			<p>While both mutational and adaptive models contribute to our understanding of metazoan genome size, neither addresses an important aspect of DNA function - the regulation of gene expression - and its possible effect on genome size and architecture. The effect of regulatory DNA on genome architecture has been ignored largely because of the difficulty of identifying regulatory elements and the general assumption that most intergenic DNA is nonfunctional. However, in lineages that have experienced high rates of DNA loss it is possible that the spatial requirements of regulatory DNA could shape intergenic distance and hence genome architecture. Here we examine how regulatory DNA influences gene distribution in two distantly related animals, <it>D. melanogaster </it>and <it>C. elegans</it>. We compare the regulatory complexity of a large sample of the genes from each animal with the spacing of these genes within each genome. We find a positive correlation between the inferred regulatory complexity of a gene and the distance from that gene to its nearest neighbor. We also find that while genes with common housekeeping functions occupy approximately the same amount of space in both <it>D. melanogaster </it>and <it>C. elegans</it>, genes that play a central role in development and pattern formation occupy significantly more space in <it>D. melanogaster</it>. Finally, it appears that <it>C. elegans </it>partitions its regulatory information upstream of the promoter, whereas no strong bias is apparent in <it>D. melanogaster</it>. We suggest that the interplay between the relatively high rate of nonfunctional DNA loss and selective pressure to maintain minimal spatial requirements for essential genetic regulatory information shapes genome architecture in these taxa.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Genomes contain relatively few genes with highly complex expression patterns</p>
				</st>
				<p>Because we cannot directly measure regulatory complexity, we developed surrogate measurements for the regulatory complexity associated with individual genes. In many cases, complex expression patterns are composed of separable tissue-specific or spatially specific subpatterns, each of which is driven by a discrete <it>cis</it>-regulatory element (see for example <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Thus, genes expressed in a greater number of tissues and spatial domains tend to require a greater number of regulatory elements to drive this expression (see for example <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>). Accordingly, we use the complexity of a gene's expression pattern as a surrogate for its regulatory complexity.</p>
				<p>In this study we measured complexity of expression pattern in two ways. First, we surveyed the curated literature-based resources of FlyBase and WormBase and generated an expression complexity index from each. FlyBase and WormBase contain information on expression pattern and mutant phenotype for every gene that has been studied in each animal. Our FlyBase index (FBx) counts domains of gene expression and tissues affected in mutant larvae, adults and embryos. FlyBase contains information on 1,879 of the 13,370 predicted genes in the euchromatic portion of the <it>D. melanogaster </it>genome, from which we generated FBx values. WormBase contains expression pattern entries for 1,125 genes of the 19,614 predicted genes in the <it>C. elegans </it>genome, from which we generated WormBase (WBx) values. Our second measure for complexity of expression pattern was obtained from the Berkeley <it>Drosophila </it>Genome Project (BDGP) <it>in situ </it>hybridization (ISH) project <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Using a random, nonredundant set of expressed sequence tags as probes, this project is systematically surveying gene expression during <it>D. melanogaster </it>embryogenesis. Annotation of the 1,728 genes surveyed (as of October 2003) was used to generate our BDGP index values (BDGPx).</p>
				<p>These indices survey the complexity of gene expression patterns in approximately 14% (FBx) and approximately 13% (BDGPx) of <it>D. melanogaster </it>genes (3,156 unique genes, ~24% of the total predicted gene set), and approximately 6% of <it>C. elegans </it>genes (WBx). All three distributions contain many genes that have a low expression complexity value and far fewer genes that have a high expression complexity value (Figure <figr fid="F1">1</figr>). This result indicates that most of the genes in these genomes are deployed in a small number of tissues, whereas a small set of genes is used repeatedly in specific tissues at specific times. Therefore, most genes in these animals are likely to require a small number of <it>cis</it>-regulatory elements, whereas a much smaller group is likely to require large arrays of regulatory elements.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Genes of low regulatory complexity are common and genes of high regulatory complexity are rare in <it>D. melanogaster </it>and <it>C. elegans</it></p>
					</caption>
					<text>
						<p>Genes of low regulatory complexity are common and genes of high regulatory complexity are rare in <it>D. melanogaster </it>and <it>C. elegans</it>. Distribution of genes with respect to complexity of expression in <b>(a) </b>FlyBase index (FBx), <b>(b) </b>BDGP <it>in situ </it>hybridization index (BDGPx), and <b>(c) </b>WormBase index (WBx). In all three cases, the distributions are heavily weighted toward genes expressed in a small number of locations and show relatively few genes deployed in a large number of tissues.</p>
					</text>
					<graphic file="gb-2004-5-4-r25-1"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Regulatory complexity and gene spacing</p>
				</st>
				<p>To accommodate a large number of separate regulatory elements, organisms could employ two basic approaches. They could increase the density of regulatory elements - that is, increase the informational content, but maintain overall size of a regulatory region (as in viruses). Alternatively, they could add elements by expanding the physical size of a regulatory region - that is, maintain the density of information, and increase the space occupied by that regulatory information. If a regulatory element requires a minimal threshold of physical space, then genes with a complex expression pattern that require more regulatory elements will also require more physical space in the genome to contain those elements. Therefore, we determined whether there is a correlation between regulatory complexity (as estimated by our expression complexity indices) and the amount of noncoding DNA flanking each gene.</p>
				<p>We determined intergenic distance for all genes in the euchromatic portions of the <it>D. melanogaster </it>and <it>C. elegans </it>genomes (intergenic distance is defined as the sum of upstream and downstream distance to the nearest neighboring genes; see Materials and methods for details) and compared this distance to each gene's expression index value. For each of the three expression indices we divided index values into bins containing roughly 10% of the genes in each sample and plotted the mean intergenic distance for each bin (division of the data into precise 10% bins was constrained by integral data values; see Materials and methods for details). We found that intergenic distance is positively correlated with expression diversity (FBx, Pearson <it>r </it>= 0.23, least-squares linear regression <it>r</it><sup>2 </sup>= 0.05, <it>p </it>&lt; 0.0001; BDGPx, <it>r </it>= 0.13, <it>r</it><sup>2 </sup>= 0.02, <it>p </it>&lt; 0.0001; WBx, <it>r </it>= 0.19, <it>r</it><sup>2 </sup>= 0.04, <it>p </it>&lt; 0.0001). More intergenic DNA flanks bins of genes inferred to have greater regulatory complexity than bins inferred to have low regulatory complexity (Tukey-Kramer HSD, &#945; &lt; 0.05; see Figure <figr fid="F2">2</figr> and Materials and methods). This is true in both <it>D. melanogaster </it>and <it>C. elegans</it>, regardless of the index used to estimate regulatory complexity (literature-derived or <it>in-situ </it>derived).</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Intergenic DNA increases with regulatory complexity in <it>D. melanogaster </it>and <it>C. elegans</it></p>
					</caption>
					<text>
						<p>Intergenic DNA increases with regulatory complexity in <it>D. melanogaster </it>and <it>C. elegans</it>. Expression indices were divided into bins, each containing approximately 10% of the entries in an index. Mean amount of intergenic DNA for each bin (&#177; standard error) was plotted for all three expression indices (left): <b>(a) </b>FBx; <b>(b) </b>BDGPx; <b>(c) </b>WBx. The average amount of intergenic DNA flanking the genes in bins of greater regulatory complexity is significantly greater than that of bins of lower regulatory complexity in all three indices (Tukey-Kramer HSD, &#945; = 0.05). In the nonparametric bivariate density plots of intergenic DNA versus index value (right), each contour represents a boundary including 10% of the data. The innermost red contour includes 10% of the data points and excludes the other 90%. The outermost purple contour includes 90% of the data points, whereas 10% fall outside this boundary.</p>
					</text>
					<graphic file="gb-2004-5-4-r25-2"/>
				</fig>
				<p>Measurement of intergenic distance does not account for the possibility of regulatory information contained within the boundaries of a gene itself (for example, 5' and 3' untranslated regions and introns). However, transcriptional regulatory elements do occur in these regions (see for example <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>). In addition, regulatory elements can lie within or beyond adjacent genes (see for example <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>). Therefore, we established an alternative means of measuring the footprint of a gene that would take these scenarios into account. We generated sliding windows spanning many genes along each <it>D. melanogaster </it>chromosome and graphed the size of each window (in base pairs) relative to position on the chromosome. Of the window sizes tested (ranging from 5 to 50 genes), an 11-gene window was judged to provide the best resolution of peaks from background variation (Figure <figr fid="F3">3</figr> and data not shown). This window measures the size of the immediate neighborhood of the central gene in an 11-gene interval (1 central gene and 5 genes on either side), providing a broader view of the arrangement of nearby genes and potential regulatory regions. Each chromosome contains regions of high gene density, where 11 genes are tightly packed with little intervening DNA, and peaks of low gene density, where 11 genes and their associated intergenic DNA are widely spaced (for a typical example see Figure <figr fid="F3">3</figr>). Low gene density indicates that one or more genes within a window have a large amount of associated noncoding DNA. By our model, peaks of low gene density, which contain more intergenic DNA, should be more likely to contain genes of high regulatory complexity. To test this prediction on the X chromosome, we identified all genes within peaks greater than a visually selected cutoff of 250 kb. We then assessed the expression complexity of genes in these large windows using our expression indices. Although most genes in the <it>D. melanogaster </it>genome are unknown with respect to expression pattern and as a result do not have index values, peaks greater than 250 kb in size contain significantly more genes of high expression complexity than the average 11-gene window on the X chromosome (Figure <figr fid="F3">3</figr>; Welch ANOVA, <it>p </it>&lt; 0.008; Wilcoxon two-sample test, <it>p </it>&lt; 0.03). Thus, we observe a significant correlation between gene spacing and regulatory complexity using three independent measures of expression complexity, two independent measures of locus size, and in two very different animals.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Regions of low gene density contain significantly more genes of high regulatory complexity</p>
					</caption>
					<text>
						<p>Regions of low gene density contain significantly more genes of high regulatory complexity. <b>(a) </b>Window size (in base pairs) of an 11-gene sliding window across the X chromosome versus position along the chromosome. The horizontal line at 250,000 bp indicates the cutoff above which a window was designated as low density. A total of 53 windows larger than 250,000 bp were identified on the X chromosome. These windows overlap to generate 14 independent peaks, numbered 1 through 14. Normalized FBx and BDGPx scores for each gene were calculated by dividing the raw index score by the maximum score for that index. The normalized scores of all low-density windows were compared to the scores of all 11-gene windows on the chromosome. The expression complexity score for low gene density windows was significantly greater than the average score for all possible windows on the X chromosome (Welch ANOVA, <it>p </it>&lt; 0.008; Wilcoxon two-sample test, <it>p </it>&lt; 0.03). <b>(b) </b>The 11 genes flanking the highest point of each numbered peak on the X chromosome. Genes boxed in red fall in the top 20% of expression complexity by FBx or the top 24% by BDGPx. Genes in unshaded boxes have expression data available, but do not fall in the upper range of the FBx or BDGP indices. Genes that are shaded, which represent the majority of genes in these windows, have no expression data available. This panel indicates only genes in the highest central peak. However, all genes within windows exceeding 250,000 bp in size were used for the statistical analysis described above.</p>
					</text>
					<graphic file="gb-2004-5-4-r25-3"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Functional classification and gene spacing</p>
				</st>
				<p>Much study of the evolution of development has focused on a relatively small subset of genes that govern multiple developmental processes <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. These genes typically encode transcription factors and signaling molecules, rather than metabolic enzymes or structural components of the cell. The repeated utilization of genes in these developmentally important classes predicts that these genes should require greater numbers of regulatory elements and larger stretches of intergenic DNA than genes with primarily housekeeping functions.</p>
				<p>To test this prediction we used functional categories based on Gene Ontology (GO) <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and additional literature-derived functional groupings to investigate the correlation between gene spacing and functional classification. Because GO annotations for <it>D. melanogaster </it>and <it>C. elegans </it>use different categorizations, they are not directly comparable. Therefore, we selected GO categories of interest from <it>D. melanogaster </it>and used BLAST to determine the best match for each fly protein in the <it>C. elegans </it>proteome. The GO categories used were: pattern specification (GO:0007389), embryonic development (GO:0009790), specific RNA polymerase II transcription factors (GO:0003704), receptor activity (GO:0004872), cell differentiation (GO:0030154), metabolism (GO:0008152), structural constituents of the ribosome (GO:0003735), and general RNA polymerase II transcription factors (GO:0016251). Some genes (for example, <it>caudal</it>, <it>Notch</it>, <it>twist</it>, and others) are members of more than one selected GO category; however, we accounted for this in our analysis (see below and Materials and methods). In addition to the GO categories, we generated a list of housekeeping genes (HK set) by combining three lists of human housekeeping genes <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> and using BLAST to identify the best single match for these genes in the <it>D. melanogaster </it>and <it>C. elegans </it>proteomes. Finally, we analyzed genes present in single copy in <it>C. elegans</it>, <it>D. melanogaster </it>and the yeast <it>Saccharomyces cerevisiae</it>, (CDY set) <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, which are likely to represent genes with primarily housekeeping functions <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
				<p>In both <it>C. elegans </it>and <it>D. melanogaster</it>, 'simple' gene groups with primarily ubiquitous or 'housekeeping' functions (CDY, general transcription factors, ribosomal constituents, metabolism and HK sets) are flanked by an average of 4-5 kb of intergenic DNA. In contrast, 'complex' groups with more diverse roles (embryonic development, pattern specification, and specific TFs) average 8-11 kb of intergenic DNA in <it>C. elegans </it>and 17-25 kb in <it>D. melanogaster </it>(Figure <figr fid="F4">4</figr>). Two groups, receptor activity and cell differentiation genes, were more variable between the two species, suggesting possible differences in the biological roles of these groups in the two organisms.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Functionally complex genes have more intergenic DNA than functionally simple genes</p>
					</caption>
					<text>
						<p>Functionally complex genes have more intergenic DNA than functionally simple genes. A comparison of intergenic distances among genes of different GO groups. The mean and median amounts of flanking intergenic DNA are shown for various functional categories of genes in <b>(a) </b><it>D. melanogaster </it>and <b>(b) </b><it>C. elegans </it>(black points and bars indicate mean value &#177; standard error; red bars indicate median values, red boxes enclose 25th-75th percentiles). Genes with low regulatory complexity are represented by the CDY, general RNA polymerase II (PolII) transcription factors, ribosomal components, metabolism, and housekeeping gene sets. Genes of high regulatory complexity are represented by receptor activity, cell differentiation, genes involved in embryonic development, genes involved in pattern specification, and specific RNA PolII transcription factors. All sets of low regulatory complexity have significantly less flanking intergenic DNA than all sets of high regulatory complexity regardless of species (Tukey-Kramer HSD, &#945; = 1 &#215; 10<sup>-4</sup>).</p>
					</text>
					<graphic file="gb-2004-5-4-r25-4"/>
				</fig>
				<p>We next pooled all genes in the five simple groups and all genes in the three complex groups to generate nonredundant gene sets. For these sets, we assessed the contribution of 5' and 3' noncoding regions to the total intergenic distance (Figure <figr fid="F5">5a</figr>). In both the <it>C. elegans </it>and <it>D. melanogaster </it>simple gene sets, 5' and 3' noncoding regions each contribute approximately 2 kb of DNA to the total intergenic distance. For the complex gene sets, total intergenic DNA is partitioned nearly equally between upstream and downstream sequences in <it>D. melanogaster</it>, whereas upstream DNA is significantly larger than downstream DNA in <it>C. elegans </it>(Figure <figr fid="F5">5a</figr>, Wilcoxon two sample test, <it>p </it>&lt; 0.0001). These results suggest that <it>C. elegans cis</it>-regulatory elements largely occupy space upstream of the regulated gene, consistent with analysis of several <it>C. elegans </it>enhancers <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. In contrast, <it>D. melanogaster </it>appears equally likely to distribute regulatory information upstream or downstream of the gene, consistent with observations of extensive 3' regulatory regions in <it>D. melanogaster </it><abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. It is important to note that while the amount of intergenic DNA flanking groups of simple genes is not significantly different between animals (Figure <figr fid="F5">5a</figr>), genes that have complex functions in <it>D. melanogaster </it>are flanked by significantly more intergenic DNA than their <it>C. elegans </it>counterparts (Tukey-Kramer HSD, &#945; = 1e-4; Wilcoxon two sample test, <it>p </it>&lt; 0.001; see Materials and methods).</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Complex genes have more intergenic DNA in <it>D. melanogaster </it>than in <it>C. elegans</it></p>
					</caption>
					<text>
						<p>Complex genes have more intergenic DNA in <it>D. melanogaster </it>than in <it>C. elegans</it>. <b>(a) </b>Mean 5' flanking DNA (5'), 3' flanking DNA (3'), and total intergenic DNA (T; all &#177; standard error) is shown for nonredundant groups of simple genes (CDY, general RNA PolII transcription factors, ribosomal components, metabolism, and housekeeping) and complex genes (embryonic development, pattern specification, and specific RNA PolII transcription factors) in <it>C. elegans </it>(blue) and <it>D. melanogaster </it>(red). <it>C. elegans </it>complex genes have significantly more 5' flanking DNA than 3' flanking DNA (Wilcoxon two-sample test, <it>p </it>&lt; 0.0001). The <it>C. elegans </it>complex group is flanked by significantly less DNA than the <it>D. melanogaster </it>complex group (Tukey-Kramer HSD, &#945; = 1 &#215; 10<sup>-4</sup>). <b>(b) </b>Distribution of intergenic DNA for all genes in <it>C. elegans </it>(blue) and <it>D. melanogaster </it>(red). In general, genes in <it>C. elegans </it>are more evenly spaced than in <it>D. melanogaster</it>. The largest class of genes in <it>D. melanogaster </it>has less than 1,000 bp of intergenic DNA separating neighboring genes, whereas the largest class in <it>C. elegans </it>has 1,000-2,000 bp. Thus, <it>D. melanogaster </it>does not have a euchromatic genome that is generally expanded with respect to <it>C. elegans</it>, even though it has many more genes with greater than 19,000 bp of flanking intergenic DNA.</p>
					</text>
					<graphic file="gb-2004-5-4-r25-5"/>
				</fig>
				<p>Approximately 15% of <it>C. elegans </it>genes are predicted to be located in co-regulated operons <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Intergenic distance between genes within operons is likely to underestimate the size of DNA used to regulate these genes and this underestimate could contribute to the observed difference in complex gene spacing between <it>C. elegans </it>and <it>D. melanogaster</it>, which does not organize genes into operons. We determined that approximately 12% of genes in the complex groups and approximately 37% of genes in the simple groups are predicted to be organized into operons in <it>C. elegans </it>(data not shown). Removing these genes from their respective datasets had no effect on the observed difference between <it>D. melanogaster </it>and <it>C. elegans </it>gene groups (Tukey-Kramer HSD, &#945; = 1 &#215; 10<sup>-4</sup>).</p>
				<p>We were also concerned that general euchromatic genome expansion in <it>D. melanogaster </it>or euchromatic genome compaction in <it>C. elegans </it>could account for the difference in amount of intergenic DNA associated with complex genes. To assess this possibility, we analyzed the distribution of intergenic DNA measurements for all genes in both animals (Figure <figr fid="F5">5b</figr>). The <it>D. melanogaster </it>genome, which has approximately 55 Mb of intergenic DNA, has more genes with large amounts of intergenic DNA than does the <it>C. elegans </it>genome, which has approximately 47 Mb of intergenic DNA (estimated using upstream and downstream intergenic distances as calculated in this study). However, this difference in intergenic spacing is not uniformly distributed, as <it>D. melanogaster </it>shows both more regions of dense gene spacing and highly dispersed gene spacing than <it>C. elegans</it>, whose genes are more evenly distributed (Figure <figr fid="F5">5b</figr>). Thus, the larger intergenic regions seen in <it>D. melanogaster </it>genes of complex function is not consistent with a general genome-wide expansion in flies or compaction in worms.</p>
				<p>Finally, we examined individual genes of complex function to examine how the difference observed at the group level would be reflected at the level of individual genes. From the CDY set and KOG (euKaryotic clusters of Orthologous Genes <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>) we identified orthologous pairs of genes or gene families in <it>D. melanogaster </it>and <it>C. elegans</it>. We then selected genes known or expected to be developmentally important in <it>D. melanogaster</it>, and confirmed their orthologous relationships with <it>C. elegans </it>genes using the KOGnitor comparison tool. These candidate groups yielded 29 relatively clear single-copy orthologs and many orthologous gene families. For a representative group of 49 <it>D. melanogaster </it>genes and their <it>C. elegans </it>counterparts (including all 29 single-copy orthologs identified and 5 gene families, Figure <figr fid="F6">6a</figr>), the mean intergenic interval is 27,928 bp in <it>D. melanogaster </it>and 7,670 bp in <it>C. elegans</it>, thoroughly consistent with the trend observed at the group level (Figure <figr fid="F4">4a</figr>). In addition, many of the <it>D. melanogaster </it>genes are located in gene-sparse regions of the genome and have larger introns (Figure <figr fid="F6">6b</figr>), suggesting that they have even more space available for potential regulatory elements than indicated by the larger flanking regions alone.</p>
				<fig id="F6">
					<title>
						<p>Figure 6</p>
					</title>
					<caption>
						<p>Developmentally important genes in <it>D. melanogaster </it>have larger intergenic intervals than their <it>C. elegans </it>counterparts</p>
					</caption>
					<text>
						<p>Developmentally important genes in <it>D. melanogaster </it>have larger intergenic intervals than their <it>C. elegans </it>counterparts. <b>(a) </b>Forty-nine developmentally important genes from <it>D. melanogaster </it>and their <it>C. elegans </it>counterparts. Genes in the top section represent orthologs, defined by KOG. Subsequent sections represent gene families. Listing of genes in different species on the same line within gene families does not imply that they are orthologous. The mean intergenic size for the <it>D. melanogaster </it>genes is 27,928 bp. Then mean intergenic size for the <it>C. elegans </it>genes is 7,670 bp. <b>(b) </b>Genomic regions of four representative gene sets in <it>D. melanogaster </it>(red) and <it>C. elegans </it>(blue). Orange boxes designate exons of the indicated genes. Gray boxes designate exons of neighboring genes. Note that genomic intervals are typically larger in <it>D. melanogaster </it>than in <it>C. elegans</it>, often owing to both larger flanking noncoding regions and larger introns. The total euchromatic genome of <it>D. melanogaster </it>is estimated at 117 Mb and the euchromatic genome of <it>C. elegans </it>is estimated at 100 Mb. The overall gene distribution within the genome is denser in flies than worms, suggesting that the larger regions of noncoding DNA associated with these representative complex genes are specifically allocated to these loci.</p>
					</text>
					<graphic file="gb-2004-5-4-r25-6"/>
				</fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>We have examined the relationship between the regulatory complexity of a gene and the spacing of that gene with respect to its neighbors in <it>D. melanogaster </it>and <it>C. elegans</it>. We show that in each animal developmentally important genes expected to possess high levels of regulatory information occupy more space in the genome than other gene classes. This regulatory information may comprise enhancer elements with well-defined binding sites for transcription factors, insulator elements, which contribute to the precise expression pattern of a gene by preventing cross-talk between enhancers <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, and other known and unknown regulatory motifs. In addition, developmentally important genes in <it>D. melanogaster </it>have more space for regulatory information than the corresponding <it>C. elegans </it>genes, and <it>C. elegans </it>tends to apportion its noncoding DNA upstream of the gene whereas <it>D. melanogaster </it>shows no significant bias. These results show that regulatory information shapes genome architecture and provide support at the genomic level for a model in which the expansion of regulatory information facilitates increased morphological complexity in metazoa.</p>
			<sec>
				<st>
					<p>Reliability of expression indices</p>
				</st>
				<p>Because direct measurement of regulatory complexity for all genes in the <it>D. melanogaster </it>and <it>C. elegans </it>genomes is not possible, we used several surrogate measures of regulatory complexity. These surrogates necessarily introduce uncertainty into our assessment of regulatory complexity, and here we attempt to assess the effect of these uncertainties on our conclusions.</p>
				<p>All three indices will tend to underestimate the true complexity of a gene's full expression pattern simply because the expression of very few genes has been surveyed in all tissues throughout the life cycle of any animal. For instance, the BDGPx only considers embryonic expression. Furthermore, little information is available on environmentally responsive gene expression, as most investigation has focused on developmental profiles of expression under standardized conditions. However, the systematic underestimation of regulatory complexity due to limited sampling across environmental conditions or developmental stages applies to all genes, not preferentially to genes expressed in either a simple or complex pattern, and therefore should not significantly bias our conclusions.</p>
				<p>Our two literature-derived indices (FBx and WBx) suffer from ascertainment bias. Genes involved in multiple developmental processes or genes that have large genomic footprints are more readily identified in genetic screens and are more likely to elicit sustained investigation. This situation has led to a relative over-representation of developmentally important genes in the literature-based indices and a probable overestimation of regulatory complexity for genes with very high FBx or WBx values. By combining genes with the highest index values into a single group, the binning of individual index values reduces the effect of overestimating regulatory complexity. In addition, GO groups and the <it>in situ </it>hybridization index (BDGPx) are immune to this sampling issue because they consider either functional classification or a completely random gene set, respectively, and each clearly shows the same trend as the literature-derived indices.</p>
				<p>Curation of the data in all three indices may also introduce uncertainty into our results. For instance, the BDGP <it>in situ </it>project annotates gene expression maintained over multiple developmental stages in a single organ as multiple distinct entries <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Similarly, housekeeping genes, whose expression may be driven by only one <it>cis</it>-regulatory element, are found in many tissues, and so the BDGPx will tend to overestimate the regulatory complexity of these genes. However, the BDGP project only annotates genes with some degree of tissue specificity, omitting ubiquitously expressed genes <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. A simple gene whose regulatory complexity has been overestimated would introduce a smaller value for intergenic distance into the high regulatory complexity group. Therefore, overestimation of regulatory complexity for simple genes should dilute, rather than enhance, the positive correlation between regulatory complexity and intergenic distance. Manually collapsing tissue annotations across developmental stages improved the correlation between intergenic DNA size and the BDGPx (data not shown), but we report the unmodified BDGP data here to avoid investigator-derived bias in our estimates of regulatory complexity. Moreover, the GO-derived groups are not subject to the same systematic biases as the other indices but show the same overall result.</p>
				<p>While it is generally accepted that complex gene expression requires complex regulatory control, we must consider the degree to which expression complexity is a legitimate proxy for regulatory complexity. The expression of particular genes in distinct morphological fields, tissues and organs is consistently controlled by physically and functionally discrete <it>cis</it>-regulatory elements (reviewed in <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>). Conversely, gene expression in populations of cells with shared identity is often controlled by a single regulatory element (see for example <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>). Thus, genes that have a complex expression pattern tend to use a greater number of <it>cis</it>-regulatory elements than genes expressed in a single tissue, location or cell type. This trend clearly supports the use of expression complexity as a surrogate for regulatory complexity. However, even genes that have a simple expression pattern occasionally use multiple <it>cis</it>-regulatory elements (see for example <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>), and an apparently complex expression pattern will sometimes be driven by a relatively simple control element (see for example <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>). As a relative measure, therefore, complexity of expression pattern should faithfully approximate regulatory complexity for a group of genes, but will not reliably predict the absolute number of <it>cis</it>-regulatory elements used by any individual gene.</p>
			</sec>
			<sec>
				<st>
					<p>Regulatory DNA and genome architecture</p>
				</st>
				<p>The distribution of regulatory information among genes in the genomes of <it>D. melanogaster </it>and <it>C. elegans </it>is not uniform. All three expression indices indicate that most genes are expressed in simple or limited domains whereas relatively few genes are expressed in a wide variety of specific tissues (Figure <figr fid="F1">1</figr>). This observation is consistent with known principles of animal development. A relatively small set of genes, primarily transcription factors and signaling molecules, play a disproportionate role in the development of metazoans (reviewed in <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>). These genes are used repeatedly during development to generate the basic body plan and specify organ identity. Once this morphological ground plan is established, a larger suite of tissue-specific genes is deployed during terminal differentiation. Accordingly, transcription factors and signaling molecules consistently have high values in our expression indices (Figure <figr fid="F4">4</figr> and data not shown) while genes of low regulatory complexity comprise the bulk of the genome.</p>
				<p>We show here how these relatively few genes of high regulatory complexity have accommodated their need for increased amounts of regulatory information. An increase in regulatory information will require either an increase in information density or an increase in the space allocated to storing that information. If the size of intergenic DNA in metazoan genomes were essentially unconstrained, an increase in the space devoted to information storage would escape notice in the background fluctuation of intergenic distance and would have no discernable effect on the distribution of genes within the genome. DNA with little informational content would predominate, and even genes that require a large number of regulatory elements would have more than enough intergenic DNA to accommodate those elements without apparent expansion. If, however, functional regulatory DNA represents a significant portion of the intergenic DNA in a genome, then there should be a direct correlation between regulatory information content and quantity of intergenic DNA <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. That is, genes with many regulatory elements will require more space, and this space will have a significant impact on the local arrangement of genes. Indeed, we find that genes predicted to have more regulatory elements occupy significantly more space than do their simple neighbors. The fact that we can see this relationship suggests that the genomes of <it>C. elegans </it>and <it>D. melanogaster </it>possess a high ratio of functional regulatory DNA to nonfunctional noncoding DNA.</p>
				<p>It is interesting to note that evidence suggesting regulatory DNA in <it>C. elegans </it>is most often positioned upstream of a gene's promoter <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> is strongly supported by our analysis of the relative size of 5' and 3' noncoding intervals for the complex gene sets. No such bias in the distribution of noncoding DNA is apparent in <it>D. melanogaster</it>, suggesting that these two animals may have different constraints on the location of regulatory information relative to the promoter of a gene.</p>
			</sec>
			<sec>
				<st>
					<p>Evolution of genome architecture</p>
				</st>
				<p>How does this architecture arise? The net difference between the rate of DNA deletion and insertion appears to determine the direction of genome expansion or compaction in many organisms <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Both the <it>D. melanogaster </it>and <it>C. elegans </it>lineages have unusually high rates of DNA deletion, leading to compact genomes <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr></abbrgrp>. For instance, the rate of DNA loss is 40 times higher in the approximately 180 Mb <it>D. melanogaster </it>genome than in the approximately 1,980 Mb genome of Hawaiian crickets <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, and is 60 times faster in <it>Drosophila </it>than in mammals <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. When the DNA-deletion rate is significantly greater than the rate of DNA insertion, deletion will predominate in reducing genome size and sculpting genome architecture. As deletions become more and more likely to remove functional DNA, selection against further deletion should tend to stabilize the minimum size of intergenic regions, and the underlying architecture of the genome will emerge.</p>
				<p>Our work suggests that high rates of DNA loss may sculpt the spacing of genes toward minimum functional requirements for regulatory DNA. Such functional constraints in noncoding DNA are known to affect distributions of insertions and/or deletions (indels). For example, constraints imposed by intronic splicing requirements influence the pattern of deletion and insertion observed in <it>D. melanogaster </it>introns <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. Comparison of noncoding regions of different <it>Drosophila </it>species indicates that conserved noncoding sequences are often found in small blocks, with conserved spacing between the blocks <abbrgrp><abbr bid="B58">58</abbr><abbr bid="B59">59</abbr></abbrgrp>. This suggests that spacing constraints also act in intergenic regions, potentially to preserve spacing between specific transcription factor binding sites or other regulatory elements, or more generally to provide sufficient physical space to insulate regulatory elements from one another. In addition, interference selection, lowered recombination due to segregation of weakly selected mutations, was suggested to account for a correlation between intergenic distance and coding region length <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. A proposed alternative, that longer genes are functionally more complex and therefore require larger noncoding regions <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>, now finds support in our observed correlation between intergenic distance and regulatory complexity. Interference selection may itself contribute to the evolution of complex regulatory regions: minimum spacers, favored in the reduction of recombination interference, may be required for recombination of complex modular regulatory elements.</p>
				<p>Other compact genomes, such as that of the teleost fish <it>Fugu rubripes</it>, are also likely to be the product of greater rates of DNA loss and are expected to show the relationship between regulatory complexity and intergenic distance demonstrated here. Even in the large human genome, there is evidence that some regions have experienced compaction where gene density is increased. Dense gene clustering implies a relative lack of local regulatory complexity and predicts that the clustered genes should have relatively simple expression patterns. This prediction is indeed supported by the presence of tissue-specific and housekeeping gene clusters and regions of high gene density in the human genome <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B61">61</abbr></abbrgrp>. Thus, the emergence of some regions of high gene density and clusters may reflect deletion acting to reveal local regulatory complexity, rather than the organization of the genome into chromatin domains or multigene transcriptional groups. In addition, the association between gene spacing and regulatory complexity could be exploited in the analysis of novel genes and genomes. Based on our results, the relative regulatory complexity of a 'novel' gene might be inferred on the basis of the architecture of its local genomic neighborhood.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>Because of the vast size of animal genomes compared to the small, relatively discrete functional elements within them, regulatory DNA has been presumed to exert little, if any, global effect on metazoan genome organization. Here we have shown that spatial requirements for regulatory DNA shape the density of genes in the genomes of <it>D. melanogaster </it>and <it>C. elegans</it>. Further, we propose that small DNA deletions, constrained by functional blocks of DNA, are the primary mechanism for sculpting genome architecture. Repeated bouts of insertion and deletion may actively shape gene distribution - globally in organisms with compact genomes, and locally in organisms with expanded genomes.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Datasets</p>
				</st>
				<p>The <it>D. melanogaster </it>genome annotations version 3.1 <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> were obtained from the BDGP. Only genes in the euchromatic portion of the genome were used for analysis. <it>C. elegans </it>genomic data were obtained from WormBase genome freeze WS100 <abbrgrp><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>.</p>
				<p>Expression data for <it>D. melanogaster </it>were obtained from two independent sources. First, we determined the number of 'Expression and Phenotype' tags for all <it>D. melanogaster </it>genes listed in FlyBase <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. Second, we measured embryonic expression complexity by counting the 'body parts' listed in the BDGP <it>in situ </it>hybridization database <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> (accessed 10 October 2003). This project uses a controlled vocabulary to annotate the expression of each gene during embryogenesis <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. <it>C. elegans </it>expression data was obtained through AQL (Acedb Query Language) queries of WormBase for all genes that possessed 'Expr_pattern' entries.</p>
				<p>The housekeeping (HK) gene set was generated by combining three lists of proposed human housekeeping genes <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. This nonredundant list was compared by BLAST <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> to the <it>D. melanogaster </it>and <it>C. elegans </it>genomes. We retained only the best hit in each genome that exceeded an E-value of 1 &#215; 10<sup>-20</sup>. The CDY (<it>C. elegans, D. melanogaster</it>, and yeast) dataset is derived from single-copy genes shared by <it>Saccharomyces</it>, <it>Drosophila </it>and <it>Caenorhabditis </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. We infer that these genes will largely have shared basal functions and few cell-type-specific functions <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Gene lists and sequences were retrieved by EnsMart from the Ensembl Genome Browser <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>. Because the <it>C. elegans </it>genome annotation employs different GO terms from that of <it>Drosophila</it>, we placed <it>C. elegans </it>genes into corresponding GO categories by BLAST of the <it>D. melanogaster </it>GO gene sets against the <it>C. elegans </it>proteome.</p>
			</sec>
			<sec>
				<st>
					<p>Spacing analysis</p>
				</st>
				<p>We wrote several PERL programs (available upon request) to parse <it>C. elegans </it>and <it>D. melanogaster </it>genomic data and calculate intergenic distances. For most genes, we defined upstream distance as the distance between the start of a gene's first exon and the boundary of the closest upstream neighboring exon (irrespective of DNA strand). We defined downstream distance as the distance between the end of a gene's last exon and the boundary of the closest downstream neighboring exon. Total intergenic distance was defined as the sum of the upstream and downstream distances. However, both genomes contained examples of genes with overlapping or interdigitated exons. In cases where exons overlapped with one another, intergenic distance was defined as zero. In cases where an exon was located within the intron of another gene, the intergenic distance was calculated from the boundary of the exon of interest to the nearest intron/exon boundary.</p>
			</sec>
			<sec>
				<st>
					<p>Data analysis and statistics</p>
				</st>
				<p>Data management and analysis were performed using a combination of PERL programs, Microsoft Excel and JMP 3.0 (SAS Institute).</p>
				<p>Composition of individual indices and bins. FlyBase index (1,879 genes): Bin 1, genes with an index value of 1, corresponding to 1 'Expression and Phenotype' entry in FlyBase, <it>N </it>= 108 entries; Bin 2, two entries, <it>N </it>= 227; Bin 3, three entries, <it>N </it>= 172; Bin 4, four to five entries, <it>N </it>= 184; Bin 5, six to eight, <it>N </it>= 206; Bin 6, 9-13, <it>N </it>= 235; Bin 7, 14-18, <it>N </it>= 184; Bin 8, 19-29, <it>N </it>= 187; Bin 9, 30-49, <it>N </it>= 193; Bin 10, 50-336, <it>N </it>= 183.</p>
				<p>BDGP index (1,698 genes): Bin1, one body part listed, <it>N </it>= 163; Bin 2, two body parts, <it>N </it>= 184; Bin 3, three body parts, <it>N </it>= 172; Bin 4, four body parts, <it>N </it>= 159; Bin 5, five body parts, <it>N </it>= 145; Bin 6, six to seven body parts, <it>N </it>= 201; Bin 7, eight to nine body parts, <it>N </it>= 180; Bin 8, 10-13, <it>N </it>= 144; Bin 9, 12-14, <it>N </it>= 142; Bin 10, 15-42, <it>N </it>= 208.</p>
				<p>WormBase index (1,130 genes): Bin 1, one 'Expr_pattern' entry, <it>N </it>= 357; Bin 2, two entries, <it>N </it>= 192; Bin 3, three entries, <it>N </it>= 116; Bin 4, four entiries, <it>N </it>= 123; Bin 5, five entries, <it>N </it>= 98; Bin 6, six entries, <it>N </it>= 61; Bin 7, seven entries, <it>N </it>= 52; Bin 8, eight entries, <it>N </it>= 39; Bin 9, 9-11, <it>N </it>= 43; Bin 10, 12-27, <it>N </it>= 49.</p>
				<p>Comparison of all pairs of bins in each index was performed using Tukey-Kramer HSD. As the size of intergenic DNA in each bin approximates a log-normal distribution (Figure <figr fid="F4">4</figr>, and data not shown) we compared both raw and log-transformed measurements. In all cases bins of higher inferred complexity tended to have higher average measures of intergenic DNA than bins of lower inferred complexity (Tukey-Kramer HSD, &#945; = 0.05).</p>
				<p>Composition of functional groups: CDY, Ce N = 1,237, Dm N = 1,250; general transcription factors, Ce N = 43, Dm N = 43; HK, Ce N = 540, Dm N = 609; pattern specification, Ce N = 73, Dm N = 73; embryonic development, Ce N = 88, Dm N = 88; specific transcription factors, Ce N = 45, Dm N = 45; metabolism, Ce N = 881, Dm N = 881; cell differentiation, Ce N = 46, Dm N = 46; receptor activity, Ce N = 106, Dm N = 106; ribosome constituents, Ce N = 93, Dm N = 93. The mean size of the intergenic DNA associated with each group suggested that the simple gene groups are not significantly different between species, but that both simple groups are smaller than both complex groups and that the <it>C. elegans </it>complex group is smaller than the <it>D. melanogaster </it>complex group (Tukey-Kramer HSD, &#945; &lt; 1e-4). This interpretation was confirmed by independent inspection of the intergenic DNA size distributions for each group. Complex groups had many more genes with large intergenic regions than simple groups did. Comparison between the <it>C. elegans </it>complex group and the <it>D. melanogaster </it>complex group was complicated by the observation that the <it>D. melanogaster </it>group contained both more genes with smaller than average intergenic regions and many more genes with much larger than average intergenic measures. We divided both raw and log-transformed measures from <it>D. melanogaster </it>and <it>C. elegans </it>into halves containing the largest and smallest 50% of genes. The largest 50% of complex genes in <it>D. melanogaster </it>is flanked by significantly more DNA than the largest 50% of <it>C. elegans </it>complex genes (Wilcoxon two-sample test, <it>p </it>&lt; 0.001).</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>An Excel file containing the primary data used for the three expression indices, the <it>D. melanogaster </it>X chromosome, and the GO groups, is included (Additional data file <supplr sid="s1">1</supplr>).</p>
			<suppl id="s1">
				<title>
					<p>Additional data file 1</p>
				</title>
				<caption>
					<p>An Excel file containing the primary data used for the three expression indices, the <it>D. melanogaster </it>X chromosome, and the GO groups</p>
				</caption>
				<text>
					<p>An Excel file containing the primary data used for the three expression indices, the <it>D. melanogaster </it>X chromosome, and the GO groups</p>
				</text>
				<file name="gb-2004-5-4-r25-s1.xls">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Dan Lautenschleger for help with PERL scripts, and Barry Williams, John Yoder and ShengQiang Shu for their input and assistance. C.E.N. is supported by NRSA#5 F32 HD41314-02. B.M.H. is supported by NRSA #F32GM65737-02. S.B.C. is an Investigator of the Howard Hughes Medical Institute.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Heterochromatin and epigenetic control of gene expression.</p>
				</title>
				<aug>
					<au>
						<snm>Grewal</snm>
						<fnm>SI</fnm>
					</au>
					<au>
						<snm>Moazed</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2003</pubdate>
				<volume>301</volume>
				<fpage>798</fpage>
				<lpage>802</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1086887</pubid>
						<pubid idtype="pmpid" link="fulltext">12907790</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>The human genome: organization and evolutionary history.</p>
				</title>
				<aug>
					<au>
						<snm>Bernardi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Annu Rev Genet</source>
				<pubdate>1995</pubdate>
				<volume>29</volume>
				<fpage>445</fpage>
				<lpage>476</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.ge.29.120195.002305</pubid>
						<pubid idtype="pmpid" link="fulltext">8825483</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>The distribution of genes in the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Mouchiroud</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>D'Onofrio</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Aissani</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Macaya</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Gautier</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Bernardi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>1991</pubdate>
				<volume>100</volume>
				<fpage>181</fpage>
				<lpage>187</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0378-1119(91)90364-H</pubid>
						<pubid idtype="pmpid">2055469</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Expression patterns and gene distribution in the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>D'Onofrio</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2002</pubdate>
				<volume>300</volume>
				<fpage>155</fpage>
				<lpage>160</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(02)01048-X</pubid>
						<pubid idtype="pmpid" link="fulltext">12468096</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Shaping animal body plans in development and evolution by modulation of Hox expression patterns.</p>
				</title>
				<aug>
					<au>
						<snm>Gellon</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>McGinnis</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>BioEssays</source>
				<pubdate>1998</pubdate>
				<volume>20</volume>
				<fpage>116</fpage>
				<lpage>125</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/(SICI)1521-1878(199802)20:2&lt;116::AID-BIES4&gt;3.3.CO;2-N</pubid>
						<pubid idtype="pmpid" link="fulltext">9631657</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes.</p>
				</title>
				<aug>
					<au>
						<snm>Warrington</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Nair</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mahadevappa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Tsyganskaya</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Physiol Genomics</source>
				<pubdate>2000</pubdate>
				<volume>2</volume>
				<fpage>143</fpage>
				<lpage>147</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11015593</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>A compendium of gene expression in normal human tissues.</p>
				</title>
				<aug>
					<au>
						<snm>Hsiao</snm>
						<fnm>LL</fnm>
					</au>
					<au>
						<snm>Dangond</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Yoshida</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hong</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Jensen</snm>
						<fnm>RV</fnm>
					</au>
					<au>
						<snm>Misra</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Dillon</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>KF</fnm>
					</au>
					<au>
						<snm>Clark</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Haverty</snm>
						<fnm>P</fnm>
					</au>
					<etal/>
				</aug>
				<source>Physiol Genomics</source>
				<pubdate>2001</pubdate>
				<volume>7</volume>
				<fpage>97</fpage>
				<lpage>104</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11773596</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Human housekeeping genes are compact.</p>
				</title>
				<aug>
					<au>
						<snm>Eisenberg</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Levanon</snm>
						<fnm>EY</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>362</fpage>
				<lpage>365</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(03)00140-9</pubid>
						<pubid idtype="pmpid" link="fulltext">12850439</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Clustering of housekeeping genes provides a unified model of gene order in the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Lercher</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Urrutia</snm>
						<fnm>AO</fnm>
					</au>
					<au>
						<snm>Hurst</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2002</pubdate>
				<volume>31</volume>
				<fpage>180</fpage>
				<lpage>183</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng887</pubid>
						<pubid idtype="pmpid" link="fulltext">11992122</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Large clusters of co-expressed genes in the <it>Drosophila </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Boutanaev</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Kalmykova</snm>
						<fnm>AI</fnm>
					</au>
					<au>
						<snm>Shevelyov</snm>
						<fnm>YY</fnm>
					</au>
					<au>
						<snm>Nurminsky</snm>
						<fnm>DI</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>420</volume>
				<fpage>666</fpage>
				<lpage>669</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01216</pubid>
						<pubid idtype="pmpid" link="fulltext">12478293</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Chromosomal clustering of muscle-expressed genes in <it>Caenorhabditis elegans</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Roy</snm>
						<fnm>PJ</fnm>
					</au>
					<au>
						<snm>Stuart</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Lund</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>SK</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>418</volume>
				<fpage>975</fpage>
				<lpage>979</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01012</pubid>
						<pubid idtype="pmpid" link="fulltext">12214599</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Evidence for large domains of similarly expressed genes in the <it>Drosophila </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Spellman</snm>
						<fnm>PT</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>GM</fnm>
					</au>
				</aug>
				<source>J Biol</source>
				<pubdate>2002</pubdate>
				<volume>1</volume>
				<fpage>5</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/1475-4924-1-5</pubid>
						<pubid idtype="pmpid" link="fulltext">12144710</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<aug>
					<au>
						<snm>Cavalier-Smith</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>The Evolution of Genome Size</source>
				<publisher>New York: John Wiley and Sons</publisher>
				<pubdate>1985</pubdate>
			</bibl>
			<bibl id="B14">
				<title>
					<p>So much "junk" DNA in our genome.</p>
				</title>
				<aug>
					<au>
						<snm>Ohno</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>In Evolution of Genetic Systems</source>
				<publisher>New York: Gordon and Breach</publisher>
				<editor>Smith HH</editor>
				<pubdate>1972</pubdate>
				<fpage>366</fpage>
				<lpage>370</lpage>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Transposable elements and the evolution of genome size in eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Kidwell</snm>
						<fnm>MG</fnm>
					</au>
				</aug>
				<source>Genetica</source>
				<pubdate>2002</pubdate>
				<volume>115</volume>
				<fpage>49</fpage>
				<lpage>63</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1023/A:1016072014259</pubid>
						<pubid idtype="pmpid">12188048</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Genome size as a mutation-selection-drift process.</p>
				</title>
				<aug>
					<au>
						<snm>Lozovskaya</snm>
						<fnm>ER</fnm>
					</au>
					<au>
						<snm>Nurminsky</snm>
						<fnm>DI</fnm>
					</au>
					<au>
						<snm>Petrov</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Hartl</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<source>Genes Genet Syst</source>
				<pubdate>1999</pubdate>
				<volume>74</volume>
				<fpage>201</fpage>
				<lpage>207</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1266/ggs.74.201</pubid>
						<pubid idtype="pmpid" link="fulltext">10734601</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Evidence for DNA loss as a determinant of genome size.</p>
				</title>
				<aug>
					<au>
						<snm>Petrov</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Sangster</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Johnston</snm>
						<fnm>JS</fnm>
					</au>
					<au>
						<snm>Hartl</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Shaw</snm>
						<fnm>KL</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>287</volume>
				<fpage>1060</fpage>
				<lpage>1062</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.287.5455.1060</pubid>
						<pubid idtype="pmpid" link="fulltext">10669421</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Mutational equilibrium model of genome size evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Petrov</snm>
						<fnm>DA</fnm>
					</au>
				</aug>
				<source>Theor Popul Biol</source>
				<pubdate>2002</pubdate>
				<volume>61</volume>
				<fpage>531</fpage>
				<lpage>544</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/tpbi.2002.1605</pubid>
						<pubid idtype="pmpid" link="fulltext">12167373</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>The bigger the C-value, the larger the cell: genome size and red blood cell size in vertebrates.</p>
				</title>
				<aug>
					<au>
						<snm>Gregory</snm>
						<fnm>TR</fnm>
					</au>
				</aug>
				<source>Blood Cells Mol Dis</source>
				<pubdate>2001</pubdate>
				<volume>27</volume>
				<fpage>830</fpage>
				<lpage>843</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/bcmd.2001.0457</pubid>
						<pubid idtype="pmpid" link="fulltext">11783946</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Genome size and developmental complexity.</p>
				</title>
				<aug>
					<au>
						<snm>Gregory</snm>
						<fnm>TR</fnm>
					</au>
				</aug>
				<source>Genetica</source>
				<pubdate>2002</pubdate>
				<volume>115</volume>
				<fpage>131</fpage>
				<lpage>146</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1023/A:1016032400147</pubid>
						<pubid idtype="pmpid">12188045</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Lineage-specific regulators couple cell lineage asymmetry to the transcription of the <it>Caenorhabditis elegans </it>POU gene <it>unc-86 </it>during neurogenesis.</p>
				</title>
				<aug>
					<au>
						<snm>Baumeister</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Ruvkun</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Genes Dev</source>
				<pubdate>1996</pubdate>
				<volume>10</volume>
				<fpage>1395</fpage>
				<lpage>1410</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8647436</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Building the heart piece by piece: modularity of <it>cis</it>-elements regulating Nkx2-5 transcription.</p>
				</title>
				<aug>
					<au>
						<snm>Schwartz</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Olson</snm>
						<fnm>EN</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>1999</pubdate>
				<volume>126</volume>
				<fpage>4187</fpage>
				<lpage>4192</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10477287</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p><it>shaven </it>and <it>sparkling </it>are mutations in separate enhancers of the <it>Drosophila Pax2 </it>homolog.</p>
				</title>
				<aug>
					<au>
						<snm>Fu</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Duan</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Frei</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Noll</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>1998</pubdate>
				<volume>125</volume>
				<fpage>2943</fpage>
				<lpage>2950</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9655816</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Early and late periodic patterns of even skipped expression are controlled by distinct regulatory elements that respond to different spatial cues.</p>
				</title>
				<aug>
					<au>
						<snm>Goto</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Macdonald</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Maniatis</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>1989</pubdate>
				<volume>57</volume>
				<fpage>413</fpage>
				<lpage>422</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2720776</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>The <it>Drosophila Pox </it>neuro gene: control of male courtship behavior and fertility as revealed by a complete dissection of all enhancers.</p>
				</title>
				<aug>
					<au>
						<snm>Boll</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Noll</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>2002</pubdate>
				<volume>129</volume>
				<fpage>5667</fpage>
				<lpage>5681</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1242/dev.00157</pubid>
						<pubid idtype="pmpid" link="fulltext">12421707</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Integration of positional signals and regulation of wing formation and identity by <it>Drosophila vestigial </it>gene.</p>
				</title>
				<aug>
					<au>
						<snm>Kim</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Sebring</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Esch</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Kraus</snm>
						<fnm>ME</fnm>
					</au>
					<au>
						<snm>Vorwerk</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Magee</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Carroll</snm>
						<fnm>SB</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1996</pubdate>
				<volume>382</volume>
				<fpage>133</fpage>
				<lpage>138</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/382133a0</pubid>
						<pubid idtype="pmpid">8700202</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>An extensive 3' regulatory region controls expression of Bmp5 in specific anatomical structures of the mouse embryo.</p>
				</title>
				<aug>
					<au>
						<snm>DiLeone</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Russell</snm>
						<fnm>LB</fnm>
					</au>
					<au>
						<snm>Kingsley</snm>
						<fnm>DM</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>1998</pubdate>
				<volume>148</volume>
				<fpage>401</fpage>
				<lpage>408</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9475750</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Transcriptional regulation of atonal during development of the <it>Drosophila </it>peripheral nervous system.</p>
				</title>
				<aug>
					<au>
						<snm>Sun</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Jan</snm>
						<fnm>LY</fnm>
					</au>
					<au>
						<snm>Jan</snm>
						<fnm>YN</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>1998</pubdate>
				<volume>125</volume>
				<fpage>3731</fpage>
				<lpage>3740</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9716538</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Systematic determination of patterns of gene expression during <it>Drosophila </it>embryogenesis.</p>
				</title>
				<aug>
					<au>
						<snm>Tomancak</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Beaton</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Weiszmann</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Kwan</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Shu</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Lewis</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Richards</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ashburner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hartenstein</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Celniker</snm>
						<fnm>SE</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>research0088.1</fpage>
				<lpage>0088.14</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12537577</pubid>
						<pubid idtype="doi">10.1186/gb-2002-3-12-research0088</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Promoter-proximal tethering elements regulate enhancer-promoter specificity in the <it>Drosophila </it>Antennapedia complex.</p>
				</title>
				<aug>
					<au>
						<snm>Calhoun</snm>
						<fnm>VC</fnm>
					</au>
					<au>
						<snm>Stathopoulos</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Levine</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>9243</fpage>
				<lpage>9247</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.142291299</pubid>
						<pubid idtype="pmpid" link="fulltext">12093913</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p><it>Cis</it>-regulatory logic in the <it>endo16 </it>gene: switching from a specification to a differentiation mode of control.</p>
				</title>
				<aug>
					<au>
						<snm>Yuh</snm>
						<fnm>CH</fnm>
					</au>
					<au>
						<snm>Bolouri</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Davidson</snm>
						<fnm>EH</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>2001</pubdate>
				<volume>128</volume>
				<fpage>617</fpage>
				<lpage>629</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11171388</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Disruption of a long-range <it>cis</it>-acting regulator for Shh causes preaxial polydactyly.</p>
				</title>
				<aug>
					<au>
						<snm>Lettice</snm>
						<fnm>LA</fnm>
					</au>
					<au>
						<snm>Horikoshi</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Heaney</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>van Baren</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>van der Linde</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Breedveld</snm>
						<fnm>GJ</fnm>
					</au>
					<au>
						<snm>Joosse</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Akarsu</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Oostra</snm>
						<fnm>BA</fnm>
					</au>
					<au>
						<snm>Endo</snm>
						<fnm>N</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>7548</fpage>
				<lpage>7553</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.112212199</pubid>
						<pubid idtype="pmpid" link="fulltext">12032320</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<aug>
					<au>
						<snm>Gerhart</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kirschner</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Cells Embryos and Evolution</source>
				<publisher>Malden, MA: Blackwell Science</publisher>
				<pubdate>1997</pubdate>
			</bibl>
			<bibl id="B34">
				<aug>
					<au>
						<snm>Carroll</snm>
						<fnm>SB</fnm>
					</au>
					<au>
						<snm>Grenier</snm>
						<fnm>JK</fnm>
					</au>
					<au>
						<snm>Weatherbee</snm>
						<fnm>SD</fnm>
					</au>
				</aug>
				<source>From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design</source>
				<publisher>Malden, MA: Blackwell Science</publisher>
				<pubdate>2001</pubdate>
			</bibl>
			<bibl id="B35">
				<aug>
					<au>
						<snm>Davidson</snm>
						<fnm>EH</fnm>
					</au>
				</aug>
				<source>Genomic Regulatory Systems: Development and Evolution</source>
				<publisher>San Diego, CA: Academic Press</publisher>
				<pubdate>2001</pubdate>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.</p>
				</title>
				<aug>
					<au>
						<snm>Ashburner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ball</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Blake</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Botstein</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Butler</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Cherry</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>AP</fnm>
					</au>
					<au>
						<snm>Dolinski</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Dwight</snm>
						<fnm>SS</fnm>
					</au>
					<au>
						<snm>Eppig</snm>
						<fnm>JT</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2000</pubdate>
				<volume>25</volume>
				<fpage>25</fpage>
				<lpage>29</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/75556</pubid>
						<pubid idtype="pmpid" link="fulltext">10802651</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>New evidence for genome-wide duplications at the origin of vertebrates using an amphioxus gene set and completed animal genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Panopoulou</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Hennig</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Groth</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Krause</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Poustka</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Herwig</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Vingron</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lehrach</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1056</fpage>
				<lpage>1066</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.874803</pubid>
						<pubid idtype="pmpid" link="fulltext">12799346</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Comparative genomics of the eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Rubin</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Yandell</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Wortman</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Gabor Miklos</snm>
						<fnm>GL</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>CR</fnm>
					</au>
					<au>
						<snm>Hariharan</snm>
						<fnm>IK</fnm>
					</au>
					<au>
						<snm>Fortini</snm>
						<fnm>ME</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>PW</fnm>
					</au>
					<au>
						<snm>Apweiler</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fleischmann</snm>
						<fnm>W</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>287</volume>
				<fpage>2204</fpage>
				<lpage>2215</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.287.5461.2204</pubid>
						<pubid idtype="pmpid" link="fulltext">10731134</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Transcription factors and transcriptional regulation.</p>
				</title>
				<aug>
					<au>
						<snm>McGhee</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Krause</snm>
						<fnm>MW</fnm>
					</au>
				</aug>
				<source>In C. elegans II</source>
				<publisher>Plainview, NY: Cold Spring Harbor Laboratory Press;</publisher>
				<editor>Riddle DL, Blumenthal T, Meyer BJ, Priess JR</editor>
				<pubdate>1997</pubdate>
				<fpage>147</fpage>
				<lpage>184</lpage>
			</bibl>
			<bibl id="B40">
				<title>
					<p>An extensive 3' <it>cis</it>-regulatory region directs the imaginal disk expression of decapentaplegic, a member of the TGF-beta family in <it>Drosophila</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Blackman</snm>
						<fnm>RK</fnm>
					</au>
					<au>
						<snm>Sanicola</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Raftery</snm>
						<fnm>LA</fnm>
					</au>
					<au>
						<snm>Gillevet</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Gelbart</snm>
						<fnm>WM</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>1991</pubdate>
				<volume>111</volume>
				<fpage>657</fpage>
				<lpage>666</lpage>
				<xrefbib>
					<pubid idtype="pmpid">1908769</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Pattern-specific expression of the <it>Drosophila decapentaplegic </it>gene in imaginal disks is regulated by 3' <it>cis</it>-regulatory elements.</p>
				</title>
				<aug>
					<au>
						<snm>Masucci</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Miltenberger</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Hoffmann</snm>
						<fnm>FM</fnm>
					</au>
				</aug>
				<source>Genes Dev</source>
				<pubdate>1990</pubdate>
				<volume>4</volume>
				<fpage>2011</fpage>
				<lpage>2023</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2177439</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>The <it>even-skipped </it>locus is contained in a 16-kb chromatin domain.</p>
				</title>
				<aug>
					<au>
						<snm>Sackerson</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Fujioka</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Goto</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Dev Biol</source>
				<pubdate>1999</pubdate>
				<volume>211</volume>
				<fpage>39</fpage>
				<lpage>52</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/dbio.1999.9301</pubid>
						<pubid idtype="pmpid" link="fulltext">10373303</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p>A global analysis of <it>Caenorhabditis elegans </it>operons.</p>
				</title>
				<aug>
					<au>
						<snm>Blumenthal</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Evans</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Link</snm>
						<fnm>CD</fnm>
					</au>
					<au>
						<snm>Guffanti</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Lawson</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Thierry-Mieg</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Thierry-Mieg</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Chiu</snm>
						<fnm>WL</fnm>
					</au>
					<au>
						<snm>Duke</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kiraly</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>SK</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>417</volume>
				<fpage>851</fpage>
				<lpage>854</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature00831</pubid>
						<pubid idtype="pmpid" link="fulltext">12075352</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>The COG database: an updated version includes eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Fedorova</snm>
						<fnm>ND</fnm>
					</au>
					<au>
						<snm>Jackson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Jacobs</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Kiryutin</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Krylov</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Mazumder</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Mekhedov</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Nikolskaya</snm>
						<fnm>AN</fnm>
					</au>
					<etal/>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>41</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/1471-2105-4-41</pubid>
						<pubid idtype="pmpid" link="fulltext">12969510</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>Insulators and boundaries: versatile regulatory elements in the eukaryotic genome.</p>
				</title>
				<aug>
					<au>
						<snm>Bell</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>West</snm>
						<fnm>AG</fnm>
					</au>
					<au>
						<snm>Felsenfeld</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2001</pubdate>
				<volume>291</volume>
				<fpage>447</fpage>
				<lpage>450</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.291.5503.447</pubid>
						<pubid idtype="pmpid" link="fulltext">11228144</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>Ras pathway specificity is determined by the integration of multiple signal-activated and tissue-restricted transcription factors.</p>
				</title>
				<aug>
					<au>
						<snm>Halfon</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Carmena</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Gisselbrecht</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Sackerson</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Jimenez</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Baylies</snm>
						<fnm>MK</fnm>
					</au>
					<au>
						<snm>Michelson</snm>
						<fnm>AM</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>2000</pubdate>
				<volume>103</volume>
				<fpage>63</fpage>
				<lpage>74</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11051548</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>Overlapping activators and repressors delimit transcriptional response to receptor tyrosine kinase signals in the <it>Drosophila </it>eye.</p>
				</title>
				<aug>
					<au>
						<snm>Xu</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Kauffmann</snm>
						<fnm>RC</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kladny</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Carthew</snm>
						<fnm>RW</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>2000</pubdate>
				<volume>103</volume>
				<fpage>87</fpage>
				<lpage>97</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11051550</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Combinatorial signaling in the specification of unique cell fates.</p>
				</title>
				<aug>
					<au>
						<snm>Flores</snm>
						<fnm>GV</fnm>
					</au>
					<au>
						<snm>Duan</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Yan</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Nagaraj</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fu</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Zou</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Noll</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Banerjee</snm>
						<fnm>U</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>2000</pubdate>
				<volume>103</volume>
				<fpage>75</fpage>
				<lpage>85</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11051549</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Multiple enhancers contribute to expression of the NK-2 homeobox gene <it>ceh-22 </it>in <it>C. elegans </it>pharyngeal muscle.</p>
				</title>
				<aug>
					<au>
						<snm>Kuchenthal</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Okkema</snm>
						<fnm>PG</fnm>
					</au>
				</aug>
				<source>Genesis</source>
				<pubdate>2001</pubdate>
				<volume>31</volume>
				<fpage>156</fpage>
				<lpage>166</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/gene.10018</pubid>
						<pubid idtype="pmpid" link="fulltext">11783006</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>Regulation and function of the <it>Drosophila </it>segmentation gene <it>fushi tarazu</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Hiromi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Gehring</snm>
						<fnm>WJ</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>1987</pubdate>
				<volume>50</volume>
				<fpage>963</fpage>
				<lpage>974</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2887293</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B51">
				<title>
					<p><it>Cis</it>-regulation downstream of cell type specification: a single compact element controls the complex expression of the CyIIa gene in sea urchin embryos.</p>
				</title>
				<aug>
					<au>
						<snm>Arnone</snm>
						<fnm>MI</fnm>
					</au>
					<au>
						<snm>Martin</snm>
						<fnm>EL</fnm>
					</au>
					<au>
						<snm>Davidson</snm>
						<fnm>EH</fnm>
					</au>
				</aug>
				<source>Development</source>
				<pubdate>1998</pubdate>
				<volume>125</volume>
				<fpage>1381</fpage>
				<lpage>1395</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9502720</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B52">
				<title>
					<p>What controls the length of noncoding DNA?</p>
				</title>
				<aug>
					<au>
						<snm>Comeron</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Curr Opin Genet Dev</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>652</fpage>
				<lpage>659</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0959-437X(00)00249-5</pubid>
						<pubid idtype="pmpid" link="fulltext">11682309</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>High rate of DNA loss in the <it>Drosophila melanogaster </it>and <it>Drosophila virilis </it>species groups.</p>
				</title>
				<aug>
					<au>
						<snm>Petrov</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Hartl</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1998</pubdate>
				<volume>15</volume>
				<fpage>293</fpage>
				<lpage>302</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9501496</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B54">
				<title>
					<p>Trash DNA is what gets thrown away: high rate of DNA loss in <it>Drosophila</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Petrov</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Hartl</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>1997</pubdate>
				<volume>205</volume>
				<fpage>279</fpage>
				<lpage>289</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(97)00516-7</pubid>
						<pubid idtype="pmpid">9461402</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B55">
				<title>
					<p>The large <it>srh </it>family of chemoreceptor genes in <it>Caenorhabditis </it>nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses.</p>
				</title>
				<aug>
					<au>
						<snm>Robertson</snm>
						<fnm>HM</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<fpage>192</fpage>
				<lpage>203</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.10.2.192</pubid>
						<pubid idtype="pmpid" link="fulltext">10673277</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>Molecular melodies in high and low C.</p>
				</title>
				<aug>
					<au>
						<snm>Hartl</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2000</pubdate>
				<volume>1</volume>
				<fpage>145</fpage>
				<lpage>149</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35038580</pubid>
						<pubid idtype="pmpid" link="fulltext">11253654</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B57">
				<title>
					<p>How intron splicing affects the deletion and insertion profile in <it>Drosophila melanogaster</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Ptak</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Petrov</snm>
						<fnm>DA</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2002</pubdate>
				<volume>162</volume>
				<fpage>1233</fpage>
				<lpage>1244</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12454069</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B58">
				<title>
					<p>Analysis of conserved noncoding DNA in <it>Drosophila </it>reveals similar constraints in intergenic and intronic sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Bergman</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Kreitman</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>1335</fpage>
				<lpage>1345</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.178701</pubid>
						<pubid idtype="pmpid" link="fulltext">11483574</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B59">
				<title>
					<p>Assessing the impact of comparative genomic sequence data on the functional annotation of the <it>Drosophila </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Bergman</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Pfeiffer</snm>
						<fnm>BD</fnm>
					</au>
					<au>
						<snm>Rincon-Limas</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Hoskins</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Gnirke</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mungall</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Kronmiller</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Pacleb</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Park</snm>
						<fnm>S</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>research0086.1</fpage>
				<lpage>0086.20</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12537575</pubid>
						<pubid idtype="doi">10.1186/gb-2002-3-12-research0086</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>Population, evolutionary and genomic consequences of interference selection.</p>
				</title>
				<aug>
					<au>
						<snm>Comeron</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Kreitman</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2002</pubdate>
				<volume>161</volume>
				<fpage>389</fpage>
				<lpage>410</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12019253</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B61">
				<title>
					<p>The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes.</p>
				</title>
				<aug>
					<au>
						<snm>Versteeg</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>van Schaik</snm>
						<fnm>BD</fnm>
					</au>
					<au>
						<snm>van Batenburg</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Roos</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Monajemi</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Caron</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Bussemaker</snm>
						<fnm>HJ</fnm>
					</au>
					<au>
						<snm>van Kampen</snm>
						<fnm>AH</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1998</fpage>
				<lpage>2004</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.1649303</pubid>
						<pubid idtype="pmpid" link="fulltext">12915492</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B62">
				<title>
					<p>Annotation of the <it>Drosophila melanogaster </it>euchromatic genome: a systematic review.</p>
				</title>
				<aug>
					<au>
						<snm>Misra</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Crosby</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Mungall</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Matthews</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Campbell</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Hradecky</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kaminker</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Millburn</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Prochnik</snm>
						<fnm>S</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>research0083.1</fpage>
				<lpage>0083.22</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12537572</pubid>
						<pubid idtype="doi">10.1186/gb-2002-3-12-research0083</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B63">
				<title>
					<p>WormBase: a cross-species database for comparative genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Harris</snm>
						<fnm>TW</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Schwarz</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Bradnam</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Lawson</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Blasier</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Kenny</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Cunningham</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Kishore</snm>
						<fnm>R</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>133</fpage>
				<lpage>137</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/gkg053</pubid>
						<pubid idtype="pmpid" link="fulltext">12519966</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B64">
				<title>
					<p>WormBase</p>
				</title>
				<url>http://www.wormbase.org</url>
			</bibl>
			<bibl id="B65">
				<title>
					<p>FlyBase</p>
				</title>
				<url>http://flybase.bio.indiana.edu</url>
			</bibl>
			<bibl id="B66">
				<title>
					<p>BDGP <it>in situ </it>homepage</p>
				</title>
				<url>http://www.fruitfly.org/cgi-bin/ex/insitu.pl</url>
			</bibl>
			<bibl id="B67">
				<title>
					<p>Basic local alignment search tool.</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Gish</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Myers</snm>
						<fnm>EW</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1990</pubdate>
				<volume>215</volume>
				<fpage>403</fpage>
				<lpage>410</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
						<pubid idtype="pmpid" link="fulltext">2231712</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B68">
				<title>
					<p>Ensembl 2002: accommodating comparative genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Clamp</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Andrews</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Barker</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Bevan</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Cameron</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Clark</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Cox</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Cuff</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Curwen</snm>
						<fnm>V</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>38</fpage>
				<lpage>42</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/gkg083</pubid>
						<pubid idtype="pmpid" link="fulltext">12519943</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
