<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2003-4-4-210</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Review</dochead>
      <bibl>
         <title>
            <p>Statistical tests for differential expression in cDNA microarray experiments</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Cui</snm>
               <fnm>Xiangqin</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2" ca="yes">
               <snm>Churchill</snm>
               <mi>A</mi>
               <fnm>Gary</fnm>
               <insr iid="I1"/>
               <email>garyc@jax.org</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>4</issue>
         <fpage>210</fpage>
         <url>http://genomebiology.com/2003/4/4/210</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2003-4-4-210</pubid>
               <pubid idtype="pmpid">12702200</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>17</day>
               <month>3</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>BioMed Central Ltd</collab>
      </cpyrt>
      <shorttitle>
         <p>Statistical tests for differential expression in cDNA microarray experiments</p>
      </shorttitle>
      <shortabs>
         <p>The simplest statistical method for extracting biological information from microarray data is the <it>t </it>test. Analysis of variance (ANOVA) and the mixed ANOVA model are general and powerful approaches for more complex microarray experiments.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the <it>t </it>test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010013">Methods</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p/>
         </st>
         <p>Gene-expression microarrays hold tremendous promise for revealing the patterns of coordinately regulated genes. Because of the large volume and intrinsic variation of the data obtained in each microarray experiment, statistical methods have been used as a way to systematically extract biological information and to assess the associated uncertainty. Here, we review some widely used methods for testing differential expression among conditions. For these purposes, we assume that the data to be used are of good quality and have been appropriately transformed (normalized) to ensure that experimentally introduced biases have been removed <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. See Box <figr fid="F2">1</figr> for a glossary of terms. For other aspects of microarray data analysis, please refer to recent reviews on experimental design <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp> and cluster analysis <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Comparing two conditions</p>
         </st>
         <p>A simple microarray experiment may be carried out to detect the differences in expression between two conditions. Each condition may be represented by one or more RNA samples. Using two-color cDNA microarrays, samples can be compared directly on the same microarray or indirectly by hybridizing each sample with a common reference sample <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr></abbrgrp>. The null hypothesis being tested is that there is no difference in expression between the conditions; when conditions are compared directly, this implies that the true ratio between the expression of each gene in the two samples should be one. When samples are compared indirectly, the ratios between the test sample and the reference sample should not differ between the two conditions. It is often more convenient to use logarithms of the expression ratios than the ratios themselves because effects on intensity of microarray signals tend be multiplicative; for example, doubling the amount of RNA should double the signal over a wide range of absolute intensities. The logarithm transformation converts these multiplicative effects (ratios) into additive effects (differences), which are easier to model; the log ratio when there is no difference between conditions should thus be zero. If a single-color expression assay is used - such as the Affymetrix system <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> - we are again considering a null hypothesis of no expression-level difference between the two conditions, and the methods described in this article can also be applied directly to this type of experiment.</p>
         <p>A distinction should be made between RNA samples obtained from independent biological sources - biological replicates - and those that represent repeated sampling of the same biological material - technical replicates. Ideally, each condition should be represented by multiple independent biological samples in order to conduct statistical tests. If only technical replicates are available, statistical testing is still possible but the scope of any conclusions drawn may be limited <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. If both technical and biological replicates are available, for example if the same biological samples are measured twice each using a dye-swap assay, the individual log ratios of the technical replicates can be averaged to yield a single measurement for each biological unit in the experiment. Callow <it>et al. </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp> describe an example of a biologically replicated two-sample comparison, and our group <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> provide an example with technical replication. More complicated settings that involve multiple layers of replication can be handled using the mixed-model analysis of variance techniques described below.</p>
         <sec>
            <st>
               <p>'Fold' change</p>
            </st>
            <p>The simplest method for identifying differentially expressed genes is to evaluate the log ratio between two conditions (or the average of ratios when there are replicates) and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. For example, if the cut-off value chosen is a two-fold difference, genes are taken to be differentially expressed if the expression under one condition is over two-fold greater or less than that under the other condition. This test, sometimes called 'fold' change, is not a statistical test, and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. The fold-change method is subject to bias if the data have not been properly normalized. For example, an excess of low-intensity genes may be identified as being differentially expressed because their fold-change values have a larger variance than the fold-change values of high-intensity genes <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Intensity-specific thresholds have been proposed as a remedy for this problem <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>The <it>t </it>test</p>
            </st>
            <p>The <it>t </it>test is a simple, statistically based method for detecting differentially expressed genes (see Box <figr fid="F3">2</figr> for details of how it is calculated). In replicated experiments, the error variance (see Box <figr fid="F2">1</figr>) can be estimated for each gene from the log ratios, and a standard <it>t </it>test can be conducted for each gene <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>; the resulting <it>t </it>statistic can be used to determine which genes are significantly differentially expressed (see below). This gene-specific <it>t </it>test is not affected by heterogeneity in variance across genes because it only uses information from one gene at a time. It may, however, have low power because the sample size - the number of RNA samples measured for each condition - is small. In addition, the variances estimated from each gene are not stable: for example, if the estimated variance for one gene is small, by chance, the <it>t </it>value can be large even when the corresponding fold change is small. It is possible to compute a global <it>t </it>test, using an estimate of error variance that is pooled across all genes, if it is assumed that the variance is homogeneous between different genes <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. This is effectively a fold-change test because the global <it>t </it>test ranks genes in an order that is the same as fold change; that is, it does not adjust for individual gene variability. It may therefore suffer from the same biases as a fold-change test if the error variance is not truly constant for all genes.</p>
         </sec>
         <sec>
            <st>
               <p>Modifications of the <it>t </it>test</p>
            </st>
            <p>As noted above, the error variance (the square root of which gives the denominator of the <it>t </it>tests) is hard to estimate and subject to erratic fluctuations when sample sizes are small. More stable estimates can be obtained by combining data across all genes, but these are subject to bias when the assumption of homogeneous variance is violated. Modified versions of the <it>t </it>test (Box <figr fid="F3">2</figr>) find a middle ground that is both powerful and less subject to bias.</p>
            <p>In the 'significance analysis of microarrays' (SAM) version of the <it>t </it>test (known as the <it>S </it>test) <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, a small positive constant is added to the denominator of the gene-specific <it>t </it>test. With this modification, genes with small fold changes will not be selected as significant; this removes the problem of stability mentioned above. The regularized <it>t </it>test <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> combines information from gene-specific and global average variance estimates by using a weighted average of the two as the denominator for a gene-specific <it>t </it>test. The <it>B </it>statistic proposed by Lonnstedt and Speed <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> is a log posterior odds ratio of differential expression versus non-differential expression; it allows for gene-specific variances but it also combines information across many genes and thus should be more stable than the <it>t </it>statistic (see Box <figr fid="F3">2</figr> for details).</p>
            <p>The <it>t </it>and <it>B </it>tests based on log ratios can be found in the Statistics for Microarray Analysis (SMA) package <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>; the <it>S </it>test is available in the SAM software package <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>; and the regularized <it>t </it>test is in the Cyber T package <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. In addition, the Bioconductor <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> has a collection of various analysis tools for microarray experiments. Additional modifications of the <it>t </it>test are discussed by Pan <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Graphical summaries (the 'volcano plot')</p>
            </st>
            <p>The 'volcano plot' is an effective and easy-to-interpret graph that summarizes both fold-change and <it>t</it>-test criteria (see Figure <figr fid="F1">1</figr>). It is a scatter-plot of the negative log<sub>10</sub>-transformed <it>p</it>-values from the gene-specific <it>t </it>test (calculated as described in the next section) against the log<sub>2 </sub>fold change (Figure <figr fid="F1">1a</figr>). Genes with statistically significant differential expression according to the gene-specific <it>t </it>test will lie above a horizontal threshold line. Genes with large fold-change values will lie outside a pair of vertical threshold lines. The significant genes identified by the <it>S, B</it>, and regularized <it>t </it>tests will tend to be located in the upper left or upper right parts of the plot.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Volcano plots</p>
               </caption>
               <text>
                  <p>Volcano plots. The negative log<sub>10</sub>-transformed <it>p</it>-values of the <it>F1 </it>test (see Box <figr fid="F4">3b</figr>) are plotted against <b>(a) </b>the log ratios (log<sub>2 </sub>fold change) in a two-sample experiment or <b>(b) </b>the standard deviations of the variety-by-gene <it>VG </it>values (see Box <figr fid="F4">3a</figr>) in a four-sample experiment. The horizontal bars in each plot represent the nominal significant level 0.001 for the <it>F1 </it>test under the assumption that each gene has a unique variance. The vertical bars represent the one-step family-wise corrected significance level 0.01 for the <it>F3 </it>test (see Box <figr fid="F4">3b</figr>) under the assumption of constant variance across all genes. Black points represent the significant genes selected by the <it>F2 </it>test with a compromise of these two variance assumptions.</p>
               </text>
               <graphic file="gb-2003-4-4-210-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Box 1</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <text>
                  <p/>
               </text>
               <graphic file="gb-2003-4-4-210-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Box 2</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <text>
                  <p/>
               </text>
               <graphic file="gb-2003-4-4-210-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Box 3</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <text>
                  <p/>
               </text>
               <graphic file="gb-2003-4-4-210-4"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Significance and multiple testing</p>
         </st>
         <sec>
            <st>
               <p>Nominal <it>p</it>-values</p>
            </st>
            <p>After a test statistic is computed, it is convenient to convert it to a <it>p</it>-value. Genes with <it>p</it>-values falling below a prescribed level (the 'nominal level') may be regarded as significant. Reporting <it>p</it>-values as a measure of evidence allows some flexibility in the interpretation of a statistical test by providing more information than a simple dichotomy of 'significant' or 'not significant' at a predefined level. Standard methods for computing <it>p</it>-values are by reference to a statistical distribution table or by permutation analysis. Tabulated <it>p</it>-values can be obtained for standard test statistics (such as the <it>t </it>test), but they often rely on the assumption that the errors in the data are normally distributed. Permutation analysis involves shuffling the data and does not require such assumptions. If permutation analysis is to be used, the experiment must be large enough that a sufficient number of distinct shuffles can be obtained. Ideally, the labels that identify which condition is represented by each sample are shuffled to simulate data from the null distribution. A minimum of about six replicates per condition (yielding a total of 924 distinct permutations) is recommended for a two-sample comparison. With multiple conditions, fewer replicates are required. If the experiment is too small, permutation analysis can be conducted by shuffling residual values across genes (see Box <figr fid="F2">1</figr>), under the assumption of homogeneous variance <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B25">25</abbr></abbrgrp>.</p>
            <p>When we conduct a single hypothesis test, we may commit one of two types of errors. A type I or false-positive error occurs when we declare a gene to be differentially expressed when in fact it is not. A type II or false-negative error occurs when we fail to detect a differentially expressed gene. A statistical test is usually constructed to control the type I error probability, and we achieve a certain power (which is equal to one minus the type II error probability) that depends on the study design, sample size, and precision of the measurements. In a microarray experiment, we may conduct thousands of statistical tests, one for each gene, and a substantial number of false positives may accumulate. The following are some of the methods available to address this problem, which is called the problem of multiple testing.</p>
         </sec>
         <sec>
            <st>
               <p>Family-wise error-rate control</p>
            </st>
            <p>One approach to multiple testing is to control the family-wise error rate (FWER), which is the probability of accumulating one or more false-positive errors over a number of statistical tests. This is achieved by increasing the stringency that we apply to each individual test. In a list of differentially expressed genes that satisfy an FWER criterion, we can have high confidence that there will be no errors in the entire list. The simplest FWER procedure is the Bonferroni correction: the nominal significance level is divided by the number of tests. The permutation-based one-step correction <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and the Westfall and Young step-down adjustment <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> provide FWER control and are generally more powerful but more computationally demanding than the Bonferroni procedure. FWER criteria are very stringent, and they may substantially decrease power when the number of tests is large.</p>
         </sec>
         <sec>
            <st>
               <p>False-discovery-rate control</p>
            </st>
            <p>An alternative approach to multiple testing considers the false-discovery rate (FDR), which is the proportion of false positives among all of the genes initially identified as being differentially expressed - that is, among all the rejected null hypotheses <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. An arguably more appropriate variation, the positive false-discovery rate (pFDR) was proposed by Storey <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. It multiplies the FDR by a factor of &#960;<sub>0</sub>, which is the estimated proportion of non-differentially expressed genes among all genes. Because &#960;<sub>0</sub>, is between 0 and 1, the estimated pFDR is smaller than the FDR. The FDR is typically computed <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> after a list of differentially expressed genes has been generated. Software for computing FDR and related quantities can be found at <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. Unlike a significance level, which is determined before looking at the data, FDR is a post-data measure of confidence. It uses information available in the data to estimate the proportion of false positive results that have occurred. In a list of differentially expressed genes that satisfies an FDR criterion, one can expect that a known proportion of these will represent false positive results. FDR criteria allow a higher rate of false positive results and thus can achieve more power than FWER procedures.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>More than two conditions</p>
         </st>
         <sec>
            <st>
               <p>Relative expression values</p>
            </st>
            <p>When there are more than two conditions in an experiment, we cannot simply compute ratios; a more general concept of relative expression is needed. One approach that can be applied to cDNA microarray data from any experimental design is to use an analysis of variance (ANOVA) model (Box <figr fid="F4">3a</figr>) to obtain estimates of the relative expression (<it>VG</it>) for each gene in each sample <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B34">34</abbr></abbrgrp>. In the microarray ANOVA model, the expression level of a gene in a given sample is computed relative to the weighted average expression of that gene over all samples in the experiment (see Box <figr fid="F4">3a</figr> for statistical details). We note that the microarray ANOVA model is not based on ratios but is applied directly to intensity data; the difference between two relative expression values can be interpreted as the mean log ratio for comparing two samples (as log<it>A </it>- log<it>B </it>= log(<it>A</it>/<it>B</it>), where log <it>A </it>and log <it>B </it>are two relative expression values). Alternatively, if each sample is compared with a common reference sample, one can use normalized ratios directly. This is an intuitive but less efficient approach to obtaining relative expression values than using the ANOVA estimates. Direct estimates of relative expression can also be obtained from single-color expression assays <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>.</p>
            <p>The set of estimated relative expression values, one for each gene in each RNA sample, is a derived data set that can be subjected to a second level of analysis. There should be one relative expression value for each gene in each independent sample. The distinction between technical replication and biological replication should be kept in mind when interpreting results from the analysis of a derived data. If inference is being made on the basis of biological replicates and there is also technical replication in the experiment, the technical replicates should be averaged to yield a single value for each independent biological unit. The derived data can be analyzed on a gene-by-gene basis using standard ANOVA methods to test for differences among conditions. For example, our group <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> have used a derived data set to test for expression differences between natural populations of fish.</p>
         </sec>
         <sec>
            <st>
               <p>Three flavors of <it>F </it>test</p>
            </st>
            <p>The classical ANOVA <it>F </it>test is a generalization of the <it>t </it>test that allows for the comparison of more than two samples (Box <figr fid="F4">3</figr>). The <it>F </it>test is designed to detect any pattern of differential expression among several conditions by comparing the variation among replicated samples within and between conditions. As with the <it>t </it>test, there are several variations on the <it>F </it>test (Box <figr fid="F4">3b</figr>). The gene-specific <it>F </it>test (<it>F1</it>), a generalization of the gene-specific <it>t </it>test, is the usual <it>F </it>test and it is computed on a gene-by-gene basis. As with <it>t </it>tests, we can also assume a common error variance for all genes and thus arrive at the global variance <it>F </it>test (<it>F3</it>). A middle ground is achieved by the <it>F2 </it>test, analogous to the regularized <it>t </it>test; this uses a weighted combination of global and gene-specific variance estimates in the denominator. Nominal <it>p</it>-values can be obtained for the <it>F </it>test, from standard tables, but the <it>F2 </it>and <it>F3 </it>statistics do not follow the tabulated <it>F </it>distribution and critical values should be established by permutation analysis.</p>
            <p>Among these tests, the <it>F3 </it>test is the most powerful, but it is also subject to the same potential biases as the fold-change test. In our experience, <it>F2 </it>has power comparable to <it>F3 </it>but it has a lower FDR than either <it>F1 </it>or <it>F3</it>. It is possible to derive a version of the <it>B </it>statistic <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for the case of multiple conditions. This could provide an alternative approach to combine variance estimates across genes in the context of multiple samples. Any of these tests can be applied to a derived data set of relative expression values to make comparisons among two or more conditions.</p>
            <p>The results of all three <it>F </it>statistics can be summarized simultaneously using a volcano plot, but with a slight twist when there are more than two samples. The standard deviation of the relative expression values is plotted on the <it>x </it>axis instead of plotting log fold change; the resulting volcano plot (Figure <figr fid="F1">1b</figr>) is similar to the right-hand half of a standard volcano plot (Figure <figr fid="F1">1a</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>The fixed-effects ANOVA model</p>
            </st>
            <p>The process of creating a derived data set and computing the <it>F </it>tests described above can be integrated in one step by applying <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B35">35</abbr></abbrgrp> our fixed-effects ANOVA model <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>; further discussion is provided Lee <it>et al. </it><abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. The fixed-effects model assumes independence among all observations and only one source of random variation. Depending on the experimental design, this source of variation could be technical, as in our study <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, or biological if applied to data as was done by Callow <it>et al. </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Although it is applicable to many microarray experiments, the fixed-effects model does not allow for multiple sources of variation, nor does it account for correlation among the observations that arise as a consequence of different layers of variation. Test statistics from the fixed-effects model are constructed using the lowest level of variation in the experiment: if a design includes both biological and technical replication, tests are based on the technical variance component. If there are replicated spots on the microarrays, the lowest level of variance will be the within-array measurement error. This is rarely appropriate for testing, and the statistical significance of results using within-array error may be artificially inflated. To avoid this problem, replicated spots from the same array can be 'collapsed' by taking the sum or average of their raw intensities. This does not fully utilize the available information, however, and we recommend application of the mixed-effects ANOVA model, described below.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Multiple-factor experiments</p>
         </st>
         <p>In a complex microarray experiment, the set of conditions may have some structure. For example, Jin <it>et al. </it><abbrgrp><abbr bid="B38">38</abbr></abbrgrp> consider eight conditions in a 2 by 2 by 2 factorial design with the factors sex, age, and genotype. There is no biological replication here, but information about biological variance is available because of the factorial design. In other experiments, both biological and technical replicates are included. For example, we <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> considered samples of five fish from each of three populations, and each fish was assayed on two microarrays with duplicated spots. In this study, the conditions of interest are the populations from which the fish were sampled; the fish are biological replicates, and there are two nested levels of technical replication, arrays and spots within arrays. To use fully the information available in experiments with multiple factors and multiple layers of sampling, we require a sophisticated statistical modeling approach.</p>
         <sec>
            <st>
               <p>The mixed-model ANOVA</p>
            </st>
            <p>The mixed model treats some of the factors in an experimental design as random samples from a population. In other words, we assume that if the experiment were to be repeated, the same effects would not be exactly reproduced but that similar effects would be drawn from a hypothetical population of effects. We therefore model these factors as sources of variance.</p>
            <p>In a mixed model for two-color microarrays (Box <figr fid="F4">3c</figr>), the gene-specific array effect (<it>AG </it>in Box <figr fid="F4">3a</figr>) is treated as a random factor. This captures an important component of technical variation. If the same clone is printed multiple times on each array we should include additional random factors for spot (<it>S</it>) and labeling (<it>L</it>) effects. Consider an array with duplicate spots of each clone. Four measurements are obtained for each clone, two in the red channel and two in the green channel. Measurements obtained on the same spot (one red and one green) will be correlated because they share common variation in the spot size. Measurement obtained in the same color (both red or both green) will be correlated because they share variation through a common labeling reaction. Failure to account for these correlations can result in underestimation of technical variance and inflated assessments of statistical significance.</p>
            <p>In experiments with multiple factors, the <it>VG </it>term in the ANOVA model is expanded to have a structure that reflects the experimental design at the level of the biological replicates, that is, independent biological samples obtained from the same conditions such as two mice of the same sex and strain. This may include both fixed and random components. Biological replicates should be treated as a random factor and will be included in the error variance of any tests that make comparisons among conditions. This provides a broad-sense inference (see Box <figr fid="F2">1</figr>) that applies to the biological population from which replicate samples were obtained <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B39">39</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Constructing tests with the mixed-model ANOVA</p>
            </st>
            <p>The components of variation attributable to each random factor in a mixed model can be estimated by any of several methods <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, of which restricted maximum likelihood (see Box <figr fid="F2">1</figr>) is the most widely used. The presence of random effects in a model can influence the estimation of other effects, including the relative expression values; these will tend to 'shrink' toward zero slightly. This effectively reduces the bias in the extremes of estimated relative expression values.</p>
            <p>In the fixed-effects ANOVA model, there is only one variance term and all factors in the model are tested against this variance. In mixed-model ANOVA, there are multiple levels of variance (biological, array, spot, and residual), and the question becomes which level we should use for the testing. The answer depends on what type of inference scope is of interest. If the interest is restricted to the specific materials and procedures used in the experiment, a narrow-sense inference, which applies only to the biological samples used in the experiment, can be made using technical variance. In most instances, however, we will be interested in a broader sense of inference that includes the biological population from which our material was sampled. In this case, all relevant sources of variance should be considered in the test <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Constructing an appropriate test statistic using the mixed model can be tricky <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> and falls outside the scope of the present discussion, but software tools are available that can be applied to compute appropriate <it>F </it>statistics, such as MAANOVA <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> and SAS <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Variations analogous to the <it>F2 </it>and <it>F3 </it>statistics are available in the MAANOVA software package <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
            <p>In conclusion, fold change is the simplest method for detecting differential expression, but the arbitrary nature of the cutoff value, the lack of statistical confidence measures, and the potential for biased conclusions all detract from its appeal. The <it>t </it>test based on log ratios and variations thereof provide a rigorous statistical framework for comparing two conditions and require replication of samples within each condition. When there are more than two conditions to compare, a more general approach is provided by the application of ANOVA <it>F </it>tests. These may be computed from derived sets of estimated relative expression values or directly through the application of a fixed-effects ANOVA model. The mixed ANOVA model provides a general and powerful approach to allow full utilization of the information available in microarray experiments with multiple factors and/or a hierarchy of sources of variation. Modifications of both <it>t </it>tests and <it>F </it>tests are available to address the problems of gene-to-gene variance heterogeneity and small sample size.</p>
         </sec>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Data transformation for cDNA microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Cui</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <pubdate>2002</pubdate>
            <url>http://www.jax.org/staff/churchill/labsite/research/expression/Cui-Transform.pdf</url>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Computational analysis of microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>418</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35076576</pubid>
                  <pubid idtype="pmpid" link="fulltext">11389458</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Fundamentals of experimental design for cDNA microarrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>32 Suppl</volume>
            <fpage>490</fpage>
            <lpage>495</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1031</pubid>
                  <pubid idtype="pmpid" link="fulltext">12454643</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Design issues for cDNA microarray experiments.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>579</fpage>
            <lpage>588</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12154381</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Clustering methods for the analysis of DNA microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Stanford, Tech report</source>
            <pubdate>1999</pubdate>
            <url>http://www-stat.stanford.edu/~tibs/research.html</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Analysis of variance for gene expression microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kerr</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>819</fpage>
            <lpage>837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050514954</pubid>
                  <pubid idtype="pmpid" link="fulltext">11382364</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Affymetrix</p>
            </title>
            <url>http://www.affymetrix.com</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Microarray expression profiling identifies genes with altered expression in HDL-deficient mice.</p>
            </title>
            <aug>
               <au>
                  <snm>Callow</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Dudoit</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gong</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>2022</fpage>
            <lpage>2029</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.10.12.2022</pubid>
                  <pubid idtype="pmpid" link="fulltext">11116096</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Statistical analysis of a gene expression microarray experiment with replication.</p>
            </title>
            <aug>
               <au>
                  <snm>Kerr</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Afshari</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bushel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Statistica Sinica</source>
            <pubdate>2000</pubdate>
            <volume>12</volume>
            <fpage>203</fpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Schena</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shalon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Heller</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <fpage>10614</fpage>
            <lpage>10619</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">38202</pubid>
                  <pubid idtype="pmpid" link="fulltext">8855227</pubid>
                  <pubid idtype="doi">10.1073/pnas.93.20.10614</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Exploring the metabolic and genetic control of gene expression on a genomic scale.</p>
            </title>
            <aug>
               <au>
                  <snm>DeRisi</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>VR</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>278</volume>
            <fpage>680</fpage>
            <lpage>686</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.278.5338.680</pubid>
                  <pubid idtype="pmpid" link="fulltext">9381177</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Statistical intelligence: effective analysis of high-density microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Draghici</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Drug Discov Today</source>
            <pubdate>2002</pubdate>
            <volume>7</volume>
            <fpage>S55</fpage>
            <lpage>S63</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1359-6446(02)02292-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">12047881</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A model for measurement error for gene expression arrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Rocke</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <fpage>557</fpage>
            <lpage>569</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652701753307485</pubid>
                  <pubid idtype="pmpid" link="fulltext">11747612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Newton</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Kendziorski</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Richmond</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Tsui</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <fpage>37</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652701300099074</pubid>
                  <pubid idtype="pmpid" link="fulltext">11339905</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Within the fold: assessing differential expression measures and reproducibility in microarray assays.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hasseman</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sharov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Saeed</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research006.12</fpage>
            <lpage>0062.12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">133446</pubid>
                  <pubid idtype="pmpid" link="fulltext">12429061</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray.</p>
            </title>
            <aug>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Jaradat</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Kargul</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Grahovac</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Pantano</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sano</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Piao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nagaraja</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>9127</fpage>
            <lpage>9132</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16833</pubid>
                  <pubid idtype="pmpid" link="fulltext">10922068</pubid>
                  <pubid idtype="doi">10.1073/pnas.97.16.9127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Global gene expression profiling in <it>Escherichia coli </it>K12. The effects of integration host factor.</p>
            </title>
            <aug>
               <au>
                  <snm>Arfin</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Ito</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Tolleri</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Riehle</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Paegle</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Hatfield</snm>
                  <fnm>GW</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2000</pubdate>
            <volume>275</volume>
            <fpage>29672</fpage>
            <lpage>29684</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M002247200</pubid>
                  <pubid idtype="pmpid" link="fulltext">10871608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Significance analysis of microarrays applied to the ionizing radiation response.</p>
            </title>
            <aug>
               <au>
                  <snm>Tusher</snm>
                  <fnm>VG</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>5116</fpage>
            <lpage>5121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33173</pubid>
                  <pubid idtype="pmpid" link="fulltext">11309499</pubid>
                  <pubid idtype="doi">10.1073/pnas.091062498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes.</p>
            </title>
            <aug>
               <au>
                  <snm>Baldi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>AD</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>509</fpage>
            <lpage>519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.6.509</pubid>
                  <pubid idtype="pmpid" link="fulltext">11395427</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Replicated microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Lonnstedt</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Statistica Sinica</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>31</fpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>R package: statistics for microarray analysis</p>
            </title>
            <url>http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html</url>
         </bibl>
         <bibl id="B22">
            <title>
               <p>SAM: Significance Analysis of Microarray</p>
            </title>
            <url>http://www-stat.stanford.edu/%7Etibs/SAM</url>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Cyber T</p>
            </title>
            <url>http://www.igb.uci.edu/servers/cybert/</url>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Bioconductor</p>
            </title>
            <url>http://www.bioconductor.org</url>
         </bibl>
         <bibl id="B25">
            <title>
               <p>A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments.</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>546</fpage>
            <lpage>554</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.4.546</pubid>
                  <pubid idtype="pmpid" link="fulltext">12016052</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>MAANOVA: a software package for the analysis of spotted cDNA microarray experiments.</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kerr</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <url>http://www.jax.org/staff/churchill/labsite/pubs/Wu_maanova.pdf</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments.</p>
            </title>
            <aug>
               <au>
                  <snm>Dudoit</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Matthew</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <pubdate>2000</pubdate>
            <url>http://www.stat.berkeley.edu/users/terry/zarray/Html/matt.html</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Controlling the false discovery rate: a practical and powerful approach to multiple testing.</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J R Stat Soc B</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The control of the false discovery rate in multiple tesing under dependency.</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Ann Stat</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>1165</fpage>
            <lpage>1168</lpage>
         </bibl>
         <bibl id="B30">
            <title>
               <p>A direct approach to false discovery rates.</p>
            </title>
            <aug>
               <au>
                  <snm>Storey</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J R Statist Soc B</source>
            <pubdate>2002</pubdate>
            <volume>64</volume>
            <fpage>479</fpage>
            <lpage>498</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/1467-9868.00346</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Storey</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <pubdate>2003</pubdate>
            <url>http://www.stat.berkeley.edu/~storey/papers/storey-springer.pdf</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>False Discovery Rate homepage</p>
            </title>
            <url>http://www.math.tau.ac.il/~roee/index.htm</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>q-value</p>
            </title>
            <url>http://www.stat.berkeley.edu/~storey/qvalue/index.html</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Models for microarray gene expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Whitmore</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Beier</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Biopharm Stat</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1</fpage>
            <lpage>19</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1081/BIP-120005737</pubid>
                  <pubid idtype="pmpid">12146717</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>research0049.1</fpage>
            <lpage>0049.12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">60310</pubid>
                  <pubid idtype="pmpid" link="fulltext">11737948</pubid>
                  <pubid idtype="doi">10.1186/gb-2001-2-11-research0049</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Exploration, normalization and summaries of high density oligonucleotide array probe level data.</p>
            </title>
            <aug>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Hobbs</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Collin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Beazer-Barclay</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Antonellis</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Scherf</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <pubdate>2002</pubdate>
            <url>http://biosun01.biostat.jhsph.edu/~ririzarr/papers/index.html</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Variation in gene expression within and among natural populations.</p>
            </title>
            <aug>
               <au>
                  <snm>Oleksiak</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>32</volume>
            <fpage>261</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng983</pubid>
                  <pubid idtype="pmpid" link="fulltext">12219088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>The contributions of sex, genotype and age to transcriptional variance in <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Jin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Passador-Gurgel</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>389</fpage>
            <lpage>395</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng766</pubid>
                  <pubid idtype="pmpid" link="fulltext">11726925</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A unified approach to mixed linear models.</p>
            </title>
            <aug>
               <au>
                  <snm>McLean</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Sanders</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Stroup</snm>
                  <fnm>WW</fnm>
               </au>
            </aug>
            <source>Am Stat</source>
            <pubdate>1991</pubdate>
            <volume>45</volume>
            <fpage>54</fpage>
            <lpage>64</lpage>
         </bibl>
         <bibl id="B40">
            <aug>
               <au>
                  <snm>Searle</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Casella</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>McCulloch</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Variance Components. New York, NY: John Wiley and Sons, Inc.;</source>
            <pubdate>1992</pubdate>
         </bibl>
         <bibl id="B41">
            <aug>
               <au>
                  <snm>Littell</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Milliken</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Stroup</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>SAS system for mixed models. Cary, NC: SAS Institute Inc.;</source>
            <pubdate>1996</pubdate>
         </bibl>
         <bibl id="B42">
            <title>
               <p>R/maanova</p>
            </title>
            <url>http://www.jax.org/staff/churchill/labsite/software/anova/rmaanova</url>
         </bibl>
         <bibl id="B43">
            <title>
               <p>SAS microarray solution</p>
            </title>
            <url>http://www.sas.com/industry/pharma/mas.html</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Statistics glossary</p>
            </title>
            <url>http://www.statsoftinc.com/textbook/glosfra.html</url>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Glossary</p>
            </title>
            <url>http://www.csse.monash.edu.au/~lloyd/tildeMML/Glossary/</url>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Internet glossary of statistical terms</p>
            </title>
            <url>http://www.animatedsoftware.com/statglos/statglos.htm</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Microarrays and their use in a comparative experiment.</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Goss</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <pubdate>2000</pubdate>
            <url>http://www-stat.stanford.edu/~tibs/research.html</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Assessing gene significance from cDNA microarray expression data via mixed models.</p>
            </title>
            <aug>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hamadeh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bushel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Afshari</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Paules</snm>
                  <fnm>RS</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <fpage>625</fpage>
            <lpage>637</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652701753307520</pubid>
                  <pubid idtype="pmpid" link="fulltext">11747616</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Statistical design and the analysis of gene expression microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kerr</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Churchill</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Genet Res</source>
            <pubdate>2001</pubdate>
            <volume>77</volume>
            <fpage>123</fpage>
            <lpage>128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1017/S0016672301005055</pubid>
                  <pubid idtype="pmpid">11355567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
