<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2005-6-10-r88</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Method</dochead>
		<bibl>
			<title>
				<p>Searching for differentially expressed gene combinations</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Dettling</snm>
					<fnm>Marcel</fnm>
					<insr iid="I1"/>
					<email>dettling@jhu.edu</email>
				</au>
				<au id="A2">
					<snm>Gabrielson</snm>
					<fnm>Edward</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>egabriel@jhmi.edu</email>
				</au>
				<au id="A3">
					<snm>Parmigiani</snm>
					<fnm>Giovanni</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<insr iid="I3"/>
					<email>gp@jhu.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21205, USA</p>
				</ins>
				<ins id="I2">
					<p>Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21205, USA</p>
				</ins>
				<ins id="I3">
					<p>Department of Biostatistics, Johns Hopkins Medical Institutions, Baltimore, MD 21205, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2005</pubdate>
			<volume>6</volume>
			<issue>10</issue>
			<fpage>R88</fpage>
			<url>http://genomebiology.com/2005/6/10/R88</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16207359</pubid><pubid idtype="doi">10.1186/gb-2005-6-10-r88</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>4</day>
					<month>4</month>
					<year>2005</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>23</day>
					<month>6</month>
					<year>2005</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>8</day>
					<month>8</month>
					<year>2005</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>19</day>
					<month>9</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>Dettling et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<shorttitle>
			<p>Finding differentially expressed gene combinations</p>
		</shorttitle>
		<shortabs>
			<p>CorScor is a novel approach to identifying gene pairs with joint differential expression. It can be used to detect phenotype-related dependencies and interactions among genes.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>We propose 'CorScor', a novel approach for identifying gene pairs with joint differential expression. This is defined as a situation with good phenotype discrimination in the bivariate, but not in the two marginal distributions. CorScor can be used to detect phenotype-related dependencies and interactions among genes. Our easily interpretable approach is scalable to current microarray dimensions and yields promising results on several cancer-gene-expression datasets.</p>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010003">Cancer</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Gene-expression monitoring by microarray technologies has become an important approach in biological and medical research over the past decade. A common experimental design is the comparison of two sets of samples from different phenotypes (diseases and normal tissue), with the goal of searching for genes showing differential expression. This is usually done via statistical testing procedures and, often, subsequent multiple testing corrections. Prominent examples include <it>t</it>-testing, significance analysis of microarrays <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, and empirical Bayes analysis <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. A comprehensive review of such approaches can be found in Pan <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. All these methods use a one-gene-at-a-time strategy, considering only the association between single genes and the phenotype.</p>
			<p>Many approaches for classification of phenotypes using microarrays do consider multiple genes simultaneously, but they address a different question, as their goal is to produce sets of differentially expressed genes for use in class prediction <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. While interesting, these approaches have the limitation that they cannot be applied comprehensively to all possible pairs, that is, there currently are no practical tools for exploring phenotype-related dependencies and interactions among all gene pairs in large datasets. In this paper we present a methodology for addressing this issue, and we show that it can find interesting biological relationships that would be missed by existing approaches.</p>
			<p>We are interested in searching for two types of gene pairs, illustrated in Figure <figr fid="F1">1</figr> by artificial examples. In the left panel, the two genes show a pronounced joint association on the phenotype: if the sum of their expression levels exceeds 3 units, we observe solely the blue-triangle phenotype. A biological mechanism leading to this phenomenon may occur when the two genes are substitutes in a molecular process that is closely linked to the phenotype. Therefore, we denote this situation as the 'substitution case'. Neither of the two genes shows a strong association with the phenotype in the univariate marginal distribution, and thus both would have been highly unlikely to appear in a gene list produced by a one-gene-at-a-time testing approach. A complementary case occurs when two genes cluster around two positively sloped axes: then the phenotype is associated with a difference in expression, a situation we refer to as the 'gap case'.</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Two artificial examples of joint differential gene expression</p>
				</caption>
				<text>
					<p>Two artificial examples of joint differential gene expression. The units of the <it>x</it>-axis and <it>y</it>-axis are gene expression; blue triangles and red circles represent samples of two different phenotypes. The inner panels reflect the joint distribution; the outer margins display the univariate marginal distributions. The dashed lines represent the first principal components, conditional on the phenotype.</p>
				</text>
				<graphic file="gb-2005-6-10-r88-1"/>
			</fig>
			<p>A more complex case is shown in our second artificial example, in the right panel of Figure <figr fid="F1">1</figr>. There is no obvious demarcation in space and, again, neither of the two genes carries information on its own. However, together they do. Biologically speaking, this example could reflect an 'on/off situation'. If both genes are off (expression values below 1.5 units), or both genes are on (expression value above 1.5 units), we observe the red-circle phenotype. In contrast, if only one of the genes is turned on, the blue-triangle phenotype is predominant.</p>
			<p>Statistically, we define joint differential expression as good phenotype discrimination by the joint distribution, but not by the univariate marginal distributions of two genes. From a functional genomics perspective, such pairs could represent interesting novel biological interactions, as for example genes that are in the same pathway.</p>
			<p>The identification of gene pairs with joint differential expression is ambitious for several reasons. First, gene pair identification is subject to the curse of dimensionality. While the usual number <it>p </it>of genes is in the tens of thousands, the number of gene pairs is <it>p</it>(<it>p</it>-1)/2, usually in the millions. Second, there are no existing and quickly computable test statistics that exactly address our notion of joint differential expression. Existing bivariate tests such as Hotelling's <it>T</it><sup>2 </sup><abbrgrp><abbr bid="B9">9</abbr></abbrgrp> only screen for differences in the bivariate mean vectors and will thus favor pairs that consist of genes with strong marginal effects. Third, identifying joint differential expression based on comparing predictive models for pairs and single genes is conceptually sound but is unattractive because of its prohibitive computational burden.</p>
			<p>Here we propose a novel, efficient, and scalable approach for searching gene pairs with joint differential expression. It relies on calculating an appropriately defined test statistic from the unconditional as well as the class-conditional correlation matrices. Therefore, we call our method CorScor, as a shorthand for correlation scoring. Its biggest advantages are its straightforward interpretation and the fact that it can be calculated very quickly, which allows for an exhaustive search among the millions of pairs even in large gene-expression datasets. On the basis of several gene-expression datasets from the literature, we illustrate our method and collect empirical evidence that it yields gene pairs that have a tendency to share biological relationships.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Data preparation</p>
				</st>
				<p>We illustrate the power and utility of our method with a comprehensive analysis of two datasets, and display the results for two further problems in the additional data files section. The first dataset discussed in detail is from a publicly available study on colon cancer by Alon <it>et al. </it><abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. It originated from Affymetrix Hum6000 arrays and contains the expression values of the 2,000 genes with highest minimal intensity across 62 colon tissues, 40 of which were tumorous and 22 of which were normal. We transformed the data by a base 10 log-transformation and standardized each array to zero mean and unit variance across genes. The second is a publicly available breast cancer dataset from Hedenfalk <it>et al. </it><abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. The data were obtained from Stanford-type cDNA microarrays, monitoring 2,654 genes across 22 breast cancer samples, 7 of which were found to carry germline <it>BRCA1 </it>mutations. Normalization was carried out following the approach of Yang <it>et al. </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Our selection of data illustrates that CorScor works independently of the platform. We require accurately preprocessed expression data from <it>n </it>samples and <it>p </it>genes, stored in an (<it>n </it>&#215; <it>p</it>) matrix denoted by (<it>x</it><sub><it>ig</it></sub>). In what follows, we will encode the phenotype information generically as 0 and 1, and store it in the <it>n</it>-dimensional response variable <it>y</it>.</p>
			</sec>
			<sec>
				<st>
					<p>The gap/substitution cases</p>
				</st>
				<p>Our method for revealing genes with joint differential expression relies on computing a simple score function. Given a pair consisting of genes <it>g </it>and <it>g'</it>, we determine a measure of pairwise dependence <it>&#961;</it>(<it>g</it>,<it>g'</it>) among their expression vectors. Next, by restricting in turn to just the samples from each phenotype, we obtain both class-conditional measures of dependence <it>&#961;</it><sub>0</sub>(<it>g</it>,<it>g'</it>) and <it>&#961;</it><sub>1</sub>(<it>g</it>,<it>g'</it>).</p>
				<p>For finding gene pairs that jointly discriminate the two phenotypes according to a gap or substitution mechanism as shown by the artificial example in the left panel of Figure <figr fid="F1">1</figr>, we recommend computing the scoring function</p>
				<p><it>S</it>(<it>&#961;</it>,<it>&#961;</it><sub>0</sub>,<it>&#961;</it><sub>1</sub>) = | <it>&#961;</it><sub>0 </sub>+ <it>&#961;</it><sub>1 </sub>- &#945;<it>&#961;</it> | &#160;&#160;&#160; (1)</p>
				<p>for all gene pairs (<it>g</it>,<it>g'</it>), using the Pearson correlation coefficient as dependence measure. Note that the operations in function (1) can be done for all gene pairs simultaneously by element-wise operations on three (<it>p </it>&#215; <it>p</it>) matrices. As illustrated in Figure <figr fid="F2">2</figr>, gene pairs with high scores indeed show good joint differential expression on the colon and <it>BRCA1 </it>data, that is, accurate phenotype discrimination and comparably uninformative marginals. Some of the gene pairs we found are correlated in one group but not in the other. While this behavior does not exactly match the prototype example from Figure <figr fid="F1">1</figr>, it still fits our definition of joint differential expression. Moreover, this loss of coregulation can be a biologically relevant feature.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Six examples of joint differential gene expression of the gap/substitution type, obtained from the colon and <it>BRCA1 </it>datasets</p>
					</caption>
					<text>
						<p>Six examples of joint differential gene expression of the gap/substitution type, obtained from the colon and <it>BRCA1 </it>datasets. The inner panels show the joint distribution; the outer margins display the univariate distributions. Blue triangles stand for cancers in colon and <it>BRCA1 </it>mutants in breast; the red circles stand for normal samples in colon and sporadic cancers in breast. The dashed lines represent the conditional first principal components.</p>
					</text>
					<graphic file="gb-2005-6-10-r88-2"/>
				</fig>
				<p>The rationale for the success of scoring function (1) is as follows. High conditional correlations arise if the data points within each group are tightly aligned along a straight line, which can be represented by the first principal components, shown in Figure <figr fid="F2">2</figr> by the dashed lines. Good joint differential expression requires such tight clustering and close-to-parallel axis alignment. Hence, high conditional correlations with concordant sign, and also a shift between the alignment axes, are necessary. The bigger this shift, and thus the clearer the joint separation, the lower the unconditional correlation <it>&#961;</it> gets. Hence, we diminish the sum of <it>&#961;</it><sub>0 </sub>and <it>&#961;</it><sub>1 </sub>by &#945;<it>&#961;</it>. By taking the absolute value, we achieve symmetric treatment of positively and negatively sloped alignment axes, that is, we can capture the gap and the substitution cases together. The scalar tuning parameter &#945; governs the balance between separation and parallel alignment. We observed empirically good results with &#945;&#8712; <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, and use &#945; = 1.5 throughout the paper.</p>
				<p>The first three columns in Table <tblr tid="T1">1</tblr> show the values of <it>&#961;</it>, <it>&#961;</it><sub>0</sub>, <it>&#961;</it><sub>1</sub>, and and the scoring function <it>S </it>for the three highest-scoring gene pairs according to the scoring function (1). As expected, the class-conditional correlations <it>&#961;</it><sub>0 </sub>and <it>&#961;</it><sub>1 </sub>tend to be high in absolute value and concordant in their signs, whereas the overall correlation is low, and sometimes even has a discordant sign.</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Correlation coefficients and CorScor values for the gap/substitution scenario</p>
					</caption>
					<tblbdy cols="7">
						<r>
							<c>
								<p/>
							</c>
							<c cspan="3" ca="center">
								<p>Colon</p>
							</c>
							<c cspan="3" ca="center">
								<p>
									<it>BRCA1</it>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="3">
								<hr/>
							</c>
							<c cspan="3">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Pair 1</p>
							</c>
							<c ca="center">
								<p>Pair 2</p>
							</c>
							<c ca="center">
								<p>Pair 3</p>
							</c>
							<c ca="center">
								<p>Pair 1</p>
							</c>
							<c ca="center">
								<p>Pair 2</p>
							</c>
							<c ca="center">
								<p>Pair 3</p>
							</c>
						</r>
						<r>
							<c cspan="7">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>&#961;</it>
								</p>
							</c>
							<c ca="center">
								<p>0.19</p>
							</c>
							<c ca="center">
								<p>-0.01</p>
							</c>
							<c ca="center">
								<p>0.02</p>
							</c>
							<c ca="center">
								<p>0.27</p>
							</c>
							<c ca="center">
								<p>0.32</p>
							</c>
							<c ca="center">
								<p>0.31</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>&#961;</it>
									<sub>0</sub>
								</p>
							</c>
							<c ca="center">
								<p>0.84</p>
							</c>
							<c ca="center">
								<p>0.65</p>
							</c>
							<c ca="center">
								<p>0.67</p>
							</c>
							<c ca="center">
								<p>-0.79</p>
							</c>
							<c ca="center">
								<p>-0.20</p>
							</c>
							<c ca="center">
								<p>-0.38</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>&#961;</it>
									<sub>1</sub>
								</p>
							</c>
							<c ca="center">
								<p>0.53</p>
							</c>
							<c ca="center">
								<p>0.33</p>
							</c>
							<c ca="center">
								<p>0.34</p>
							</c>
							<c ca="center">
								<p>-0.63</p>
							</c>
							<c ca="center">
								<p>-0.96</p>
							</c>
							<c ca="center">
								<p>-0.78</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>S</it>(<it>&#961;</it>,<it>&#961;</it><sub>0</sub>,<it>&#961;</it><sub>1</sub>)</p>
							</c>
							<c ca="center">
								<p>1.09</p>
							</c>
							<c ca="center">
								<p>0.99</p>
							</c>
							<c ca="center">
								<p>0.98</p>
							</c>
							<c ca="center">
								<p>1.82</p>
							</c>
							<c ca="center">
								<p>1.64</p>
							</c>
							<c ca="center">
								<p>1.62</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Conditional and unconditional correlation coefficients, as well as the value of the scoring functions from Equation (1) with &#945; = 1.5, for the top three gene pairs in both the colon and the <it>BRCA1 </it>data.</p>
					</tblfn>
				</tbl>
				<p>A concise visualization of the scores of gene pairs with joint differential expression is a heat map, as shown in Figure <figr fid="F3">3</figr>. We select the first 50 genes involved in the top-ranked gene pairs and color-code the score for all 50<sup>2</sup>/2 = 1,250 gene pairs from black (low value) through shaded grey to white (high value, excellent joint differential expression). Rows and columns of this symmetric matrix are rearranged according to a hierarchical clustering, such that genes that share common joint differential expression properties lie adjacent. We hypothesize that clustered genes may tend to share biological relationship. An exploratory analysis on the colon data supports this: the most prominent feature is a group of genes that can be found at positions 39 to 45 of the matrix. It consists of the genes with HUGO symbols <it>GSN</it>, <it>ACTN1</it>, <it>SPARCL1</it>, <it>ITGA7</it>, <it>TPM1</it>, <it>and COL6A2</it>.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Symmetric heat map of CorScor values from Equation (1), for the colon and <it>BRCA1 </it>data</p>
					</caption>
					<text>
						<p>Symmetric heat map of CorScor values from Equation (1), for the colon and <it>BRCA1 </it>data. Columns and rows are rearranged according to a hierarchical clustering. Displayed are the 50 genes that are involved in the pairs with the highest scores. Black stands for low, grey for intermediate, and white for high score.</p>
					</text>
					<graphic file="gb-2005-6-10-r88-3"/>
				</fig>
				<p>Three of these six genes (<it>GSN</it>, <it>ACTN1</it>, and <it>SPARCL1</it>) share a common annotation in the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>). They are all involved in the 'regulation of actin cytoskeleton'. The remaining three genes lack pathway annotation in KEGG, but an analysis of their Gene Ontology terms (GO <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>) still reveals a functional connection: <it>TPM1 </it>has the GO terms 'actin binding' and 'cytoskeleton'. <it>SPARCL1 </it>is involved in 'calcium ion binding', a term it shares with <it>GSN </it>and <it>ACTN1</it>.</p>
				<p>The heat map of the <it>BRCA1 </it>data, shown in the right panel of Figure <figr fid="F3">3</figr>, does not show an equally pronounced block structure. The absence of KEGG annotation for a large proportion of the genes makes it challenging to carry out the same type of validation. However, consistent with the known DNA-binding function of the <it>BRCA1 </it>gene <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, many of the genes are related to binding activities. For a full overview of the genes involved in the heat maps, we refer readers to our supplementary web page <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
				<p>Our findings on the colon data illustrate that CorScor has the potential to bring up gene pairs with a functional relationship, and that our heat maps are a helpful visualization tool for grouping and detecting the most important ones among them. The major benefit of CorScor, compared with established clustering techniques based on the expression values of single genes, is that we are able to capture genes without strong marginal effects. The genes involved in our pairs do not show pronounced fold changes across the phenotypes, but nevertheless seem to be key in molecular processes closely linked to the phenotype.</p>
			</sec>
			<sec>
				<st>
					<p>The on/off-case</p>
				</st>
				<p>Another scenario in which joint differential expression is important is illustrated with the artificial example in the right panel of Figure <figr fid="F1">1</figr>. While the marginal distributions are not informative, the joint distribution clearly is: one phenotype is prevalent when the expression of both genes is either turned on or turned off, whereas the other phenotype is predominant when only one of the genes is expressed. An effective scoring function to capture these gene pairs is</p>
				<p><it>S</it>(<it>&#961;</it>,<it>&#961;</it><sub>0</sub>,<it>&#961;</it><sub>1</sub>) = | <it>&#961;</it><sub>1 </sub>- <it>&#961;</it><sub>0 </sub>|, &#160;&#160;&#160; (2)</p>
				<p>the difference of the class-conditional dependence measures &#961;<sub>0 </sub>and <it>&#961;</it><sub>1</sub>. We use Spearman's rank correlations in (2), because this prevents outlier-driven situations from appearing among the top gene pairs. Table <tblr tid="T2">2</tblr> shows the values of <it>&#961;</it><sub>0</sub>, <it>&#961;</it><sub>1 </sub>and <it>S </it>for the top-scoring gene pairs in the colon and <it>BRCA1 </it>data. We observe fairly high conditional correlations here, which is partly caused by the use of Spearman's rank correlation.</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Correlation coefficients and CorScor values for the on/off scenario</p>
					</caption>
					<tblbdy cols="7">
						<r>
							<c>
								<p/>
							</c>
							<c cspan="3" ca="center">
								<p>Colon</p>
							</c>
							<c cspan="3" ca="center">
								<p>
									<it>BRCA1</it>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="3">
								<hr/>
							</c>
							<c cspan="3">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Pair 1</p>
							</c>
							<c ca="center">
								<p>Pair 2</p>
							</c>
							<c ca="center">
								<p>Pair 3</p>
							</c>
							<c ca="center">
								<p>Pair 1</p>
							</c>
							<c ca="center">
								<p>Pair 2</p>
							</c>
							<c ca="center">
								<p>Pair 3</p>
							</c>
						</r>
						<r>
							<c cspan="7">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>&#961;</it>
									<sub>0</sub>
								</p>
							</c>
							<c ca="center">
								<p>0.54</p>
							</c>
							<c ca="center">
								<p>0.48</p>
							</c>
							<c ca="center">
								<p>-0.72</p>
							</c>
							<c ca="center">
								<p>0.86</p>
							</c>
							<c ca="center">
								<p>0.93</p>
							</c>
							<c ca="center">
								<p>0.89</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>&#961;</it>
									<sub>1</sub>
								</p>
							</c>
							<c ca="center">
								<p>-0.67</p>
							</c>
							<c ca="center">
								<p>-0.68</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
							<c ca="center">
								<p>-1.00</p>
							</c>
							<c ca="center">
								<p>-0.93</p>
							</c>
							<c ca="center">
								<p>-0.95</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>S</it>(<it>&#961;</it>,<it>&#961;</it><sub>0</sub>,<it>&#961;</it><sub>1</sub>)</p>
							</c>
							<c ca="center">
								<p>1.21</p>
							</c>
							<c ca="center">
								<p>1.17</p>
							</c>
							<c ca="center">
								<p>1.13</p>
							</c>
							<c ca="center">
								<p>1.86</p>
							</c>
							<c ca="center">
								<p>1.86</p>
							</c>
							<c ca="center">
								<p>1.84</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Conditional and unconditional correlation coefficients, as well as the value of the scoring functions from Equation (2) for the top three gene pairs in both the colon and the <it>BRCA1 </it>data.</p>
					</tblfn>
				</tbl>
				<p>Figure <figr fid="F4">4</figr> shows scatterplots of the highest-scoring gene pairs on the colon and <it>BRCA1 </it>data. Joint differential expression is clearly present and an interesting biological interpretation can be derived from these scatterplots. As an example, we discuss the best-scoring gene pair from the <it>BRCA1 </it>data: for the wild-type samples (represented by red circles), there is a high positive correlation between <it>TAF12</it>, a gene that is related to transcription initiation, and <it>RB1</it>, a transcription inhibitor. For the <it>BRCA1 </it>mutant samples, the situation is reversed and the two genes show a strong negative correlation. This observation suggests a specific nuclear pathway that may be distorted as a result of <it>BRCA1 </it>mutations.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Six examples of joint differential gene expression according to the on/off-scenario, obtained from the colon and <it>BRCA1 </it>data</p>
					</caption>
					<text>
						<p>Six examples of joint differential gene expression according to the on/off-scenario, obtained from the colon and <it>BRCA1 </it>data. The inner panels show the joint distribution; the outer margins display the univariate distributions. Blue triangles stand for cancerous and <it>BRCA1 </it>mutants, the red circles for normal and <it>BRCA1 </it>wild types, respectively. The dashed lines represent the direction of the conditional first principal components.</p>
					</text>
					<graphic file="gb-2005-6-10-r88-4"/>
				</fig>
				<p>We emphasize again that because of the very different scope, such findings could not be made with one-at-a-time gene selection and/or hierarchical clustering based on gene-expression values. Again, for this on/off-scenario, the full information and annotation of the genes that are involved in the most promising gene pairs are available from our supplementary website <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Permutation analysis</p>
				</st>
				<p>Next, we address the question of whether and how many gene pairs achieve promising score values by chance alone. We do this by performing permutation-based empirical Bayes analysis <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. We generate 100 noise gene-expression datasets by scrambling the phenotype labels. We then run CorScor on each of these 100 noise datasets, obtain a vector of score values with length <it>p</it>(<it>p</it>-1)/2 and rank their values. By taking the average within rank over the 100 permutations, we obtain an estimated null distribution of CorScor values.</p>
				<p>The histograms in Figure <figr fid="F5">5</figr> display the right tail of the permutation distribution to the right of the 95% quantile. The dashed vertical lines mark the score value of the top three gene pairs (shown in Figures <figr fid="F2">2</figr> and <figr fid="F4">4</figr>) on both the gap/substitution and the on/off situation, and for both datasets. For the top gene pairs, we also give the fraction of null scores that exceed the observed values, which is an approximation to the empirical false-discovery rate. The permutation distribution has a somewhat heavier tail and slower decay for the on/off situation. Furthermore, when comparing the colon and <it>BRCA1 </it>permutation scores, we observe that the latter have higher values. This is caused by the difference in sample size. When we arbitrarily restricted the colon dataset to the same size as the <it>BRCA1 </it>dataset, the score values were in the same range (data not shown).</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Histograms displaying the right tail of the permutation distributions of CorScor in the colon and <it>BRCA1 </it>data</p>
					</caption>
					<text>
						<p>Histograms displaying the right tail of the permutation distributions of CorScor in the colon and <it>BRCA1 </it>data. The dashed vertical lines indicate the score values of the top three gene pairs from Figures 2 and 4. Also reported is the fraction of null scores (tail.p) that exceed each of observed values.</p>
					</text>
					<graphic file="gb-2005-6-10-r88-5"/>
				</fig>
				<p>Table <tblr tid="T3">3</tblr> shows the number of gene pairs that exceed a given quantile of the permutation distribution, together with the ratio of observed versus expected number of gene pairs exceeding these quantiles. Again here, we observe that in the gap/substitution scenario, more gene pairs reach very high significance levels. In general, our results confirm that it is unlikely that the gene pairs we report have their joint differential expression due to chance alone.</p>
				<tbl id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Gene pairs exceeding quantiles</p>
					</caption>
					<tblbdy cols="9">
						<r>
							<c>
								<p/>
							</c>
							<c cspan="2" ca="center">
								<p>Colon: G/S</p>
							</c>
							<c cspan="2" ca="center">
								<p>Colon: O/O</p>
							</c>
							<c cspan="2" ca="center">
								<p><it>BRCA1</it>: G/S</p>
							</c>
							<c cspan="2" ca="center">
								<p><it>BRCA1</it>: O/O</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="8">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Quantile</p>
							</c>
							<c ca="center">
								<p>#</p>
							</c>
							<c ca="center">
								<p>o/e</p>
							</c>
							<c ca="center">
								<p>#</p>
							</c>
							<c ca="center">
								<p>o/e</p>
							</c>
							<c ca="center">
								<p>#</p>
							</c>
							<c ca="center">
								<p>o/e</p>
							</c>
							<c ca="center">
								<p>#</p>
							</c>
							<c ca="center">
								<p>o/e</p>
							</c>
						</r>
						<r>
							<c cspan="9">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1,446</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>10<sup>-6</sup></p>
							</c>
							<c ca="center">
								<p>2,204</p>
							</c>
							<c ca="center">
								<p>1.1&#183;10<sup>3</sup></p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5.0&#183;10<sup>-1</sup></p>
							</c>
							<c ca="center">
								<p>45</p>
							</c>
							<c ca="center">
								<p>1.3.10<sup>1</sup></p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>2.3&#183;10<sup>0</sup></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>10<sup>-5</sup></p>
							</c>
							<c ca="center">
								<p>5,917</p>
							</c>
							<c ca="center">
								<p>3.0&#183;10<sup>2</sup></p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>5.5&#183;10<sup>-1</sup></p>
							</c>
							<c ca="center">
								<p>444</p>
							</c>
							<c ca="center">
								<p>1.3.10<sup>1</sup></p>
							</c>
							<c ca="center">
								<p>69</p>
							</c>
							<c ca="center">
								<p>2.0&#183;10<sup>0</sup></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>10<sup>-4</sup></p>
							</c>
							<c ca="center">
								<p>11,260</p>
							</c>
							<c ca="center">
								<p>5.6&#183;10<sup>1</sup></p>
							</c>
							<c ca="center">
								<p>167</p>
							</c>
							<c ca="center">
								<p>8.4&#183;10<sup>-1</sup></p>
							</c>
							<c ca="center">
								<p>2,473</p>
							</c>
							<c ca="center">
								<p>7.0.10<sup>0</sup></p>
							</c>
							<c ca="center">
								<p>584</p>
							</c>
							<c ca="center">
								<p>1.7&#183;10<sup>0</sup></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>10<sup>-3</sup></p>
							</c>
							<c ca="center">
								<p>22,701</p>
							</c>
							<c ca="center">
								<p>1.1&#183;10<sup>1</sup></p>
							</c>
							<c ca="center">
								<p>1,924</p>
							</c>
							<c ca="center">
								<p>9.6&#183;10<sup>-1</sup></p>
							</c>
							<c ca="center">
								<p>12488</p>
							</c>
							<c ca="center">
								<p>3.6.10<sup>0</sup></p>
							</c>
							<c ca="center">
								<p>5,063</p>
							</c>
							<c ca="center">
								<p>1.4&#183;10<sup>0</sup></p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>The number of gene pairs (#) that exceed a given quantile of the permutation distribution in the data for colon and <it>BRCA1</it>, along with the ratio of observed versus expected (o/e) number of gene pairs exceeding this threshold. The abbreviations G/S and O/O refer to the scoring function: G/S, gap/substitution scenario; O/O, on/off scoring situation.</p>
					</tblfn>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>Comparison with predictive modeling</p>
				</st>
				<p>Next, we contrast the results of searching for jointly differentially expressed gene pairs by CorScor to an alternative search based on predictive modeling, implemented with logistic regression. This is also a novel method, although some ideas in this direction were presented in a conference talk by P. Wirapati <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The predictive-modeling approach is far more computer intensive and currently not applicable to arrays with tens of thousands of features. We chose the following procedure for our predictive-modeling search. In the gap/substitution situation and for each gene pair (<it>g</it>,<it>g'</it>), we fitted three logistic regression models: a model with both genes as additive inputs to capture bivariate differential expression, and two univariate models with each gene as input to capture the marginal separation. This generates conditional probability estimates <it>p</it><sub><it>i</it></sub>(<it>x</it><sub><it>g</it></sub>, <it>x</it><sub><it>g'</it></sub>), <it>p</it><sub><it>i</it></sub>(<it>x</it><sub><it>g</it></sub>), and <it>p</it><sub><it>i</it></sub>(<it>x</it><sub><it>g'</it></sub>) for each observation <it>i</it>. We then compute three log-likelihoods on the basis of these probabilities,</p>
				<p>
					<graphic file="gb-2005-6-10-r88-i1.gif"/>
				</p>
				<p>The log-likelihood is a very natural measure for the amount of discrimination in binary problems. A gene pair with good joint differential expression reflecting a gap or substitution should show good discrimination for the bivariate model but comparably poor discrimination for the single-gene models. Hence, we can define a scoring function based on predictive modeling as</p>
				<p>
					<graphic file="gb-2005-6-10-r88-i2.gif"/>
				</p>
				<p>The left two panels in Figure <figr fid="F6">6</figr> show scatterplots of CorScor's outcome versus predictive-modeling scores in the gap/substitution situation. The correlation between the two measures is 0.39 for the colon data, and 0.30 for the <it>BRCA1 </it>data.</p>
				<fig id="F6">
					<title>
						<p>Figure 6</p>
					</title>
					<caption>
						<p>Comparison of CorScor and predictive modeling scores</p>
					</caption>
					<text>
						<p>Comparison of CorScor and predictive modeling scores. Density plots for a comparison of the gap/substitution scoring function from correlation scoring defined in Equation (1) and predictive modeling (Equation (4)), as well as the on/off objective measures defined in Equations (2) and (5). Each panel is divided into a 50-&#215;-50-cell grid. The darker the color of a cell, the more instances are therein. In the figure header, cor is the Pearson correlation coefficient between the CorScor and the respective predictive modeling scores.</p>
					</text>
					<graphic file="gb-2005-6-10-r88-6"/>
				</fig>
				<p>The on/off-scenario requires a different approach. For each gene pair (<it>g</it>,<it>g'</it>), we chose to measure the improvement in predictive accuracy when comparing a full two-gene interaction model versus a two-gene additive model. This requires generating conditional probability estimates <it>p</it><sub><it>i</it></sub>(<it>x</it><sub><it>g</it></sub>,<it>x</it><sub><it>g'</it></sub>,<it>x</it><sub><it>gg'</it></sub>) and <it>p</it><sub><it>i</it></sub>(<it>x</it><sub><it>g</it></sub>, <it>x</it><sub><it>g'</it></sub>) using logistic regression for each observation <it>i</it>. These are then plugged into the log-likelihood from (3). From these, we can obtain a predictive-modeling-based scoring function for the on/off scenario via</p>
				<p><it>T</it>(<it>g</it>,<it>g'</it>) = <it>l</it>(<it>y</it>,<it>p</it>(<it>x</it><sub><it>g</it></sub>,<it>x</it><sub><it>g'</it></sub>,<it>x</it><sub><it>gg'</it></sub>)) - <it>l</it>(<it>y</it>,<it>p</it>(<it>x</it><sub><it>g</it></sub>,<it>x</it><sub><it>g'</it></sub>)) &#160;&#160;&#160; (5)</p>
				<p>The concordance of this measure with CorScor's output is illustrated in the right two panels of Figure <figr fid="F6">6</figr>. We observe a correlation of 0.54 in the colon data and 0.29 in the <it>BRCA1 </it>data, but many of CorScor's top-scoring gene pairs are not identified by predictive modeling.</p>
				<p>For further investigation of these differences between CorScor and logistic regression, we performed a simulation study that makes it possible to judge differences in the power for detecting joint differential expression. We adopt a scenario similar to the colon data, with two phenotypes of 22 and 40 samples each. For the gap/substitution situation, the gene expressions for the two phenotypes are simulated independently according to a bivariate normal distribution with conditional correlations of 0.6. The amount of joint differential expression is controlled via a shift in the means on both axes, staggered at <graphic file="gb-2005-6-10-r88-i3.gif"/> standard deviations. We consider the gene pairs without mean shift (and thus with overlapping data point clouds) as the null situation without joint differential expression. The situation with <graphic file="gb-2005-6-10-r88-i4.gif"/> standard deviations of mean shift approximately corresponds to the amount of joint differential expression in the best gene pairs from the colon data. We generated 100 such gene pairs, determined the score values for CorScor and logistic regression, and display the ability of detecting joint differential expression with receiver operating characteristic (ROC) curves in Figure <figr fid="F7">7</figr>. We observe that logistic regression does better for the slight mean shifts, but for a moderate to large amount of joint differential expression, the two methods perform equally well.</p>
				<fig id="F7">
					<title>
						<p>Figure 7</p>
					</title>
					<caption>
						<p>Power analysis for detecting joint differential expression</p>
					</caption>
					<text>
						<p>Power analysis for detecting joint differential expression. Receiver operating characteristic (ROC) curves that display the fraction of false positives, or discriminatory ability, in our simulation study to detect joint differential expression. The left panel summarizes information about the gap/substitution scenario; the right panel is about the on/off scenario. The solid lines correspond to CorScor, and the dashed ones, to logistic regression. Finally, the strength of joint differential expression was set at five different levels in our simulation experiment. The yellow lines are for the weakest amount of joint differential expression and the black lines, for the strongest amount.</p>
					</text>
					<graphic file="gb-2005-6-10-r88-7"/>
				</fig>
				<p>For the on/off-scenario, the gene expressions for the two phenotypes are also simulated from independent normal distributions, but without mean shift. The amount of joint differential expression is controlled by the conditional correlations, positive for one phenotype, negative for the other. The correlation coefficients are staggered at values of <graphic file="gb-2005-6-10-r88-i5.gif"/> with a correlation of zero corresponding to the null situation without joint differential expression and a value of <graphic file="gb-2005-6-10-r88-i6.gif"/> being representative of the best pairs we see in true datasets. The right panel in Figure <figr fid="F7">7</figr> displays the ROC curves for these simulations. We observe only slight differences between logistic regression and CorScor. Both methods show good power for detecting gene pairs with strong joint differential expression as they are found in true microarray datasets. In summary, we conclude that CorScor is as powerful at detecting relevant amounts of joint differential expression as logistic regression, but has a markedly lower computational cost.</p>
			</sec>
			<sec>
				<st>
					<p>Software</p>
				</st>
				<p>All our computations were implemented in the statistical programming language R <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Via its function <it>cor</it>, it provides a very convenient and efficient routine for estimating Pearson and Spearman gene-pair correlation coefficients from an expression matrix. In the colon and <it>BRCA1 </it>data, an exhaustive search across all gene pairs with CorScor takes about 5 seconds on a 1.5 GHz Intel-Pentium-powered personal computer with 512 Mb of RAM.</p>
				<p>All our code for identifying gene pairs with joint differential expression, as well as for their visualization by scatterplots and heat maps, is available as a documented package named <it>corscor</it>, and will be submitted to the Bioconductor project <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Links and updates can also be found on our supplementary website <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>In a recent paper, Xiao and colleagues <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> considered multivariate searches for differentially expressed gene combinations. Their goal was to uncover subsets of predefined size <it>k </it>that are such that the multivariate distributions of expression in the two phenotypes differ. Similar ideas were used by the same group in the context of data exploration and variable selection <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The goal of their approach is to uncover sets that potentially consist of combinations of joint and marginally differentially expressed genes. This is a different goal from that considered here. For example, in Figure <figr fid="F4">4</figr>, vertically shifting all the blue points would increase multivariate difference but leave the on/off scores from Equation (2) unchanged. Here, we emphasize the search for interactions per se, because of the clearer functional genomics implications, though high multivariate distance can also be of interest. The Xiao <it>et al</it>. approach is computationally demanding because each set is evaluated by an additional cross-validation. Comprehensive exploration of all pairs is challenging and stochastic search is necessary for subsets of three or more.</p>
			<p>In the section 'Comparison with predictive modeling', we presented an approach to screening for joint differential expression based on predictive modeling. While this shares the scope of CorScor, it is not scalable to the current dimensions of gene-expression data. A full search with predictive modeling on the colon or the <it>BRCA1 </it>data with less than 3,000 genes each requires about two weeks of central processing unit time, whereas CorScor needs only about 5 seconds. Since the number of gene pairs and thus the computing time grows quadratically with the number of genes, the analysis of a roughly quintupled Affymetrix HGU133 array with more than 12,000 genes would increase the computing time by a factor of roughly 25, making the predictive-modeling approach prohibitive for practical application. We also observed that the gene pairs found by CorScor and by the predictive-modeling approach differ. To develop a better sense of the nature of the differences, we visually compared a large number of gene pairs from the two methods (not shown). The scatterplots of the top gene pairs according to the gap/substitution predictive-modeling scoring function in Equation (4) reveal that the predictive approach is very sensitive to outliers, whereas CorScor is more robust in this regard. Additionally, the joint separation is often more pronounced with CorScor. In the on/off search, visual scatterplot inspection and examination of gene annotations favor CorScor further. The predictive-modeling objective function in Equation (5) does not seem to exactly match the scope of its correlation-based counterpart and generally did not yield any gene pairs that could serve as indicators for aberrant molecular processes.</p>
			<p>In the on/off search, in particular, a critical difference is in the fact that pairs can show strong evidence of a reversal in the sign of the conditional correlations, while still having a substantial overlap of the two conditional distributions (see for example the top left and top right pairs in Figure <figr fid="F4">4</figr>). This can lead to a high CorScor value, but leads to only a moderate predictive score, and a small multivariate distance. These cases, however, can be highly relevant biologically, and it is important to be able to identify them. In conclusion, of the two approaches that we are proposing and investigating here, CorScor is the simpler and more efficient computationally, and it also appears to identify gene pairs that are more promising candidates for a detailed biological analysis.</p>
			<p>Another tool for finding interactions among gene pairs is relevance networks <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. They examine interactions among genes by thresholding covariance matrices and graphically displaying the connections among the genes whose correlations exceed the threshold. We investigated a different type of gene interactions here, namely interactions that are altered as a result of the phenotype comparison of interest. However, the type of visualization implemented in relevance networks could also be used to represent the findings of our algorithm. Moreover, our approach was illustrated here using Pearson's and Spearman's correlations, but the general idea can be extended straightforwardly to any easily computed measure of pairwise association among gene expression levels. Finally, Zhou <it>et al. </it><abbrgrp><abbr bid="B26">26</abbr></abbrgrp> introduced second-order expression correlations that investigate regulatory networks by exploring variation of correlations across conditions. Whereas their method focuses on concordant correlations, our approach is based on correlation differences.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>In summary, this paper presents a novel approach for finding gene pairs with joint differential expression. This represents a complement to the widely used one-gene-at-a-time testing approaches and the associated list-enrichment tests. The idea behind joint differential expression is to find genes that only in pairs, and not individually, discriminate two given phenotypes. These pairs make it possible to explore dependence and interaction among genes, as well as to screen for molecular processes that are linked to disease. Since the usual number of gene pairs is in the millions, there is a need for a quickly computable criterion. We propose two scoring functions, based on conditional and unconditional correlation coefficients. We show that these measures have the ability to uncover gene pairs that show promising scatterplot patterns and tend to share a biological relationship. In cancer research, a strength of CorScor lies in its potential ability to find genes that have not traditionally been involved with cancer, as they may represent new avenues for cancer cell biology and, more importantly, therapeutic intervention.</p>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data are available with the online version of this paper. To provide further evidence for the general applicability of the CorScor approach, we provide empirical results for four additional microarray problems as additional data files. Additional data file <supplr sid="S1">1</supplr> is from a publicly available leukemia study by Armstrong <it>et al. </it><abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The data originated from Affymetrix HG U95A arrays and, after our normalization, feature the expression of 6,177 genes across a total of 72 samples. For the CorScor analysis, we restricted to the binary distinction of 24 samples from acute lymphoblastic leukemias (ALL) versus 28 samples from acute myeloid leukemias (AML).</p>
			<p>Additional data file <supplr sid="S2">2</supplr> is based on a dataset from a publicly available lung cancer study of Bhattacharjee <it>et al. </it><abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. It also originated from Affymetrix HG U95A arrays and contains 3,171 genes after our normalization. The CorScor analysis was run on 20 carcinoid samples and 17 normal lung tissues. Additional data file <supplr sid="S3">3</supplr> is a dataset from the seminal leukemia study of Golub <it>et al. </it><abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. It originated from Affymetrix Hu6800 arrays. The version we used after our normalization contained the expression of 3,571 genes across a total of 72 samples, 25 of which were from patients who had acute myeloid leukemias and 47 of which were from patients with acute lymphoblastic leukemia. Additional data file <supplr sid="S4">4</supplr> is our analysis of publicly available cDNA arrays from Gruvberger <it>et al. </it><abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. The data in Additional data file <supplr sid="S4">4</supplr> monitor 3,389 genes across 30 estrogen-receptor-negative and 28 estrogen-receptor-positive breast cancer samples.</p>
			<p>The scatterplots in the additional data files clearly show the presence of joint differential expression for the gap/substitution situation in all four datasets. Again, our idea works here because the red and blue data points are tightly aligned along their respective principle component, yielding good conditional correlation. On the other hand, the two phenotypes are separated, resulting in a low overall correlation. Also, the scatterplots for the on/off-situation clearly show the presence of joint differential expression, and they confirm that that there are gene pairs with reverse correlation in the case and control samples.</p>
			<p>In the tables in the additional data files, we report the results from the permutation test on each of the four datasets. They are qualitatively similar to the ones from the colon and <it>BRCA1 </it>data shown in Table <tblr tid="T3">3</tblr>, meaning that, again, the real gene pairs score sufficiently better than the random ones.</p>
			<suppl id="S1">
				<title>
					<p>Additional File 1</p>
				</title>
				<caption>
					<p>Data from a publicly available leukemia study by Armstrong <it>et al.</it></p>
				</caption>
				<text>
					<p>Data from a publicly available leukemia study by Armstrong <it>et al. </it><abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The data originated from Affymetrix HG U95A arrays and, after our normalization, feature the expression of 6,177 genes across a total of 72 samples. For the CorScor analysis, we restricted to the binary distinction of 24 samples from acute lymphoblastic leukemias (ALL) versus 28 samples from acute myeloid leukemias (AML)</p>
				</text>
				<file name="gb-2005-6-10-r88-S1.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S2">
				<title>
					<p>Additional File 2</p>
				</title>
				<caption>
					<p>A dataset from a publicly available lung cancer study of Bhattacharjee <it>et al.</it></p>
				</caption>
				<text>
					<p>A dataset from a publicly available lung cancer study of Bhattacharjee <it>et al. </it><abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. It also originated from Affymetrix HG U95A arrays and contains 3,171 genes after our normalization. The CorScor analysis was run on 20 carcinoid samples and 17 normal lung tissues</p>
				</text>
				<file name="gb-2005-6-10-r88-S2.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S3">
				<title>
					<p>Additional File 3</p>
				</title>
				<caption>
					<p>A dataset from the seminal leukemia study of Golub <it>et al.</it></p>
				</caption>
				<text>
					<p>A dataset from the seminal leukemia study of Golub <it>et al. </it><abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. It originated from Affymetrix Hu6800 arrays. The version we used after our normalization contained the expression of 3,571 genes across a total of 72 samples, 25 of which were from patients who had acute myeloid leukemias and 47 of which were from patients with acute lymphoblastic leukemia</p>
				</text>
				<file name="gb-2005-6-10-r88-S3.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S4">
				<title>
					<p>Additional File 4</p>
				</title>
				<caption>
					<p>Our analysis of publicly available cDNA arrays from Gruvberger <it>et al.</it></p>
				</caption>
				<text>
					<p>Our analysis of publicly available cDNA arrays from Gruvberger <it>et al. </it><abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. The data monitor 3,389 genes across 30 estrogen-receptor-negative and 28 estrogen-receptor-positive breast cancer samples</p>
				</text>
				<file name="gb-2005-6-10-r88-S4.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>Work supported by NSF grant NSF034211, by the Johns Hopkins SPORE in breast cancer P50CA88843 and GI cancer P50CA62924, and by core grant P30CA06973. We thank Ben Ho Park for his useful comments.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Significance analysis of microarrays applied to the ionizing radiation response.</p>
				</title>
				<aug>
					<au>
						<snm>Tusher</snm>
						<fnm>VG</fnm>
					</au>
					<au>
						<snm>Tibshirani</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Chu</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>5116</fpage>
				<lpage>5121</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">33173</pubid>
						<pubid idtype="pmpid" link="fulltext">11309499</pubid>
						<pubid idtype="doi">10.1073/pnas.091062498</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Empirical Bayes analysis of a microarray experiment.</p>
				</title>
				<aug>
					<au>
						<snm>Efron</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Tibshirani</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Storey</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Tusher</snm>
						<fnm>V</fnm>
					</au>
				</aug>
				<source>J Am Stat Assoc</source>
				<pubdate>2001</pubdate>
				<volume>96</volume>
				<fpage>1151</fpage>
				<lpage>1160</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1198/016214501753382129</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments.</p>
				</title>
				<aug>
					<au>
						<snm>Pan</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>546</fpage>
				<lpage>554</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.4.546</pubid>
						<pubid idtype="pmpid" link="fulltext">12016052</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Classification in microarray experiments.</p>
				</title>
				<aug>
					<au>
						<snm>Dudoit</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Fridlyand</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Statistical Analysis of Gene Expression Data</source>
				<publisher>New York: Chapman and Hall</publisher>
				<editor>Speed T</editor>
				<pubdate>2003</pubdate>
				<fpage>93</fpage>
				<lpage>158</lpage>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Finding predictive gene groups from microarray data.</p>
				</title>
				<aug>
					<au>
						<snm>Dettling</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>B&#252;hlmann</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>J Multivariate Anal</source>
				<pubdate>2004</pubdate>
				<volume>90</volume>
				<fpage>106</fpage>
				<lpage>131</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1016/j.jmva.2004.02.012</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Bagboosting for tumor classification with gene expression data.</p>
				</title>
				<aug>
					<au>
						<snm>Dettling</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>3583</fpage>
				<lpage>3593</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15466910</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression.</p>
				</title>
				<aug>
					<au>
						<snm>Li</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ogihara</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>2429</fpage>
				<lpage>2437</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth267</pubid>
						<pubid idtype="pmpid" link="fulltext">15087314</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Robust classification modeling on microarray data using misclassification penalized posterior.</p>
				</title>
				<aug>
					<au>
						<snm>Soukup</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Cho</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21 (suppl 1)</volume>
				<fpage>i423</fpage>
				<lpage>i430</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bti1020</pubid>
						<pubid idtype="pmpid" link="fulltext">15961487</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Multivariate quality control.</p>
				</title>
				<aug>
					<au>
						<snm>Hotelling</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Techniques of Statistical Analysis</source>
				<publisher>New York: McGraw-Hill</publisher>
				<editor>Eisenhart C, Hastay MW, Wallis WA</editor>
				<pubdate>1947</pubdate>
				<fpage>111</fpage>
				<lpage>184</lpage>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.</p>
				</title>
				<aug>
					<au>
						<snm>Alon</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Barkai</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Notterdam</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Gish</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Ybarra</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Mack</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Levine</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>6745</fpage>
				<lpage>6750</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">21986</pubid>
						<pubid idtype="pmpid" link="fulltext">10359783</pubid>
						<pubid idtype="doi">10.1073/pnas.96.12.6745</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Princeton Colorectal Cancer Research Page</p>
				</title>
				<url>http://microarray.princeton.edu/oncology</url>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Gene-expression profiles in hereditary breast cancer</p>
				</title>
				<aug>
					<au>
						<snm>Hedenfalk</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Duggan</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Radmacher</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bittner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Simon</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Meltzer</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Gusterson</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Esteller</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Raffeld</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>New Engl J Med</source>
				<pubdate>2001</pubdate>
				<volume>344</volume>
				<fpage>539</fpage>
				<lpage>548</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1056/NEJM200102223440801</pubid>
						<pubid idtype="pmpid" link="fulltext">11207349</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Hedenfalk <it>BRCA1</it> Data Supplementary Page.</p>
				</title>
				<url>http://research.nhgri.nih.gov/microarray/NEJM_Supplement</url>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Dudoit</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Luu</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Lin</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Peng</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Ngai</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Speed</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>e15</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">100354</pubid>
						<pubid idtype="pmpid" link="fulltext">11842121</pubid>
						<pubid idtype="doi">10.1093/nar/30.4.e15</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>KEGG: Kyoto Encyclopedia of Genes and Genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Kanehisa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Goto</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>27</fpage>
				<lpage>30</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102409</pubid>
						<pubid idtype="pmpid" link="fulltext">10592173</pubid>
						<pubid idtype="doi">10.1093/nar/28.1.27</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Gene ontology: tool for the unification of biology: the Gene Ontology Consortium.</p>
				</title>
				<aug>
					<au>
						<snm>Ashburner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ball</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Blake</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Botstein</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Butler</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Cherry</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>AP</fnm>
					</au>
					<au>
						<snm>Dolinski</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Dwight</snm>
						<fnm>SS</fnm>
					</au>
					<au>
						<snm>Eppig</snm>
						<fnm>JT</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2000</pubdate>
				<volume>25</volume>
				<fpage>25</fpage>
				<lpage>29</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/75556</pubid>
						<pubid idtype="pmpid" link="fulltext">10802651</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>From the cover: direct DNA binding by BRCA1.</p>
				</title>
				<aug>
					<au>
						<snm>Paull</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Cortez</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Bowers</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Elledge</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gellert</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>6086</fpage>
				<lpage>6091</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">33426</pubid>
						<pubid idtype="pmpid" link="fulltext">11353843</pubid>
						<pubid idtype="doi">10.1073/pnas.111125998</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Marcel Dettling's Joint Differential Expression Supplementary Page</p>
				</title>
				<url>http://stat.ethz.ch/~dettling/jde.html</url>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Identifying Joint Differential Expression in Microarray Data</p>
				</title>
				<url>http://stat.ethz.ch/talks/Ascona_04/Slides/wirapati.pdf</url>
			</bibl>
			<bibl id="B20">
				<aug>
					<au>
						<cnm>R Development Core Team</cnm>
					</au>
				</aug>
				<source>R: A Language and Environment for Statistical Computing</source>
				<publisher>Vienna, Austria</publisher>
				<pubdate>2004</pubdate>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Bioconductor: open software development for computational biology and bioinformatics.</p>
				</title>
				<aug>
					<au>
						<snm>Gentleman</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Carey</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Bates</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Bolstad</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Dettling</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Dudoit</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ellis</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Gautier</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Ge</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Gentry</snm>
						<fnm>J</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R80</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">545600</pubid>
						<pubid idtype="pmpid" link="fulltext">15461798</pubid>
						<pubid idtype="doi">10.1186/gb-2004-5-10-r80</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Multivariate search for differentially expressed gene combinations.</p>
				</title>
				<aug>
					<au>
						<snm>Xiao</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Frisina</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gordon</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Klebanov</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Yakovlev</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>164</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">529250</pubid>
						<pubid idtype="pmpid" link="fulltext">15507138</pubid>
						<pubid idtype="doi">10.1186/1471-2105-5-164</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Variable selection and pattern recognition with gene expression data generated by the microarray technology.</p>
				</title>
				<aug>
					<au>
						<snm>Szabo</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Boucher</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Carroll</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Klebanov</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Tsodikov</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Yakovlev</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Math Biosci</source>
				<pubdate>2002</pubdate>
				<volume>176</volume>
				<fpage>71</fpage>
				<lpage>98</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0025-5564(01)00103-1</pubid>
						<pubid idtype="pmpid" link="fulltext">11867085</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Multivariate exploratory tools for microarray data analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Szabo</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Boucher</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Klebanov</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Tsodikov</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Yakovlev</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Biostatistics</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>555</fpage>
				<lpage>567</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/biostatistics/4.4.555</pubid>
						<pubid idtype="pmpid" link="fulltext">14557111</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks.</p>
				</title>
				<aug>
					<au>
						<snm>Butte</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Tamayo</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Slonim</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Golub</snm>
						<fnm>TR</fnm>
					</au>
					<au>
						<snm>Kohane</snm>
						<fnm>IS</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>12182</fpage>
				<lpage>12186</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">17315</pubid>
						<pubid idtype="pmpid" link="fulltext">11027309</pubid>
						<pubid idtype="doi">10.1073/pnas.220392197</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Functional annotation and network reconstruction through cross-platform integration of microarray data.</p>
				</title>
				<aug>
					<au>
						<snm>Zhou</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Kao</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Wong</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Nunez-Iglesias</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Primig</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Aparicio</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Finch</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Morgan</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Wong</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2005</pubdate>
				<volume>23</volume>
				<fpage>238</fpage>
				<lpage>243</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt1058</pubid>
						<pubid idtype="pmpid" link="fulltext">15654329</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia.</p>
				</title>
				<aug>
					<au>
						<snm>Armstrong</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Staunton</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Silverman</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Pieters</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>den Boer</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Minden</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sallan</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Lander</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Golub</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Korsmeyer</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>41</fpage>
				<lpage>47</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng765</pubid>
						<pubid idtype="pmpid" link="fulltext">11731795</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Broad Institute Cancer Program Publication.</p>
				</title>
				<url>http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&amp;paper_id=63</url>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.</p>
				</title>
				<aug>
					<au>
						<snm>Bhattacharjee</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Richards</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Staunton</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Monti</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Vasa</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Ladd</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Behesti</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Bueno</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gillette</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>13790</fpage>
				<lpage>13795</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">61120</pubid>
						<pubid idtype="pmpid" link="fulltext">11707567</pubid>
						<pubid idtype="doi">10.1073/pnas.191502998</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Meyerson Laboratory: Lung Cancer Genomics</p>
				</title>
				<url>http://research.dfci.harvard.edu/meyersonlab/lungca/</url>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.</p>
				</title>
				<aug>
					<au>
						<snm>Golub</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Slonim</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Tamayo</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Huard</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Gassenbeek</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Mesirov</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Coller</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Loh</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Downing</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Caliguri</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>1999</pubdate>
				<volume>286</volume>
				<fpage>531</fpage>
				<lpage>538</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.286.5439.531</pubid>
						<pubid idtype="pmpid" link="fulltext">10521349</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Broad Institute: Cancer Program Datasets</p>
				</title>
				<url>http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi</url>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns.</p>
				</title>
				<aug>
					<au>
						<snm>Gruvberger</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ringner</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Panavally</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Saal</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Borg</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Fern&#246;</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Meltzer</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Cancer Res</source>
				<pubdate>2001</pubdate>
				<volume>61</volume>
				<fpage>5979</fpage>
				<lpage>5984</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11507038</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>NIH Website Supporting the Gruvberger <it>et al. </it>Publication.</p>
				</title>
				<url>http://research.nhgri.nih.gov/microarray/ER_data.txt</url>
			</bibl>
		</refgrp>
	</bm>
</art>
