<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-5-r69</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Towards the uniform distribution of null <it>P </it>values on Affymetrix microarrays</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Fodor</snm>
               <mi>A</mi>
               <fnm>Anthony</fnm>
               <insr iid="I1"/>
               <email>anthony.fodor@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Tickle</snm>
               <mi>L</mi>
               <fnm>Timothy</fnm>
               <insr iid="I1"/>
               <email>tltickle@email.uncc.edu</email>
            </au>
            <au id="A3">
               <snm>Richardson</snm>
               <fnm>Christine</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>caricha2@email.uncc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Resource Center, The University of North Carolina at Charlotte, University City Boulevard, Charlotte, North Carolina 28223, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biology, The University of North Carolina at Charlotte, University City Boulevard, Charlotte, North Carolina 28223, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>5</issue>
         <fpage>R69</fpage>
         <url>http://genomebiology.com/2007/8/5/R69</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17472745</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-5-r69</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>11</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>8</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>1</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>01</day>
               <month>05</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Fodor et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Uniform distribution of microarray <it>P </it>values</p>
      </shorttitle>
      <shortabs>
         <p>Estimating the <it>P </it>value from the overall distribution of scores on the microarray can produce <it>P </it>values that are much closer to a uniform distribution.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Methods to control false-positive rates require that <it>P </it>values of genes that are not differentially expressed follow a uniform distribution. Commonly used microarray statistics can generate <it>P </it>values that do not meet this assumption. We show that poorly characterized variance, imperfect normalization, and cross-hybridization are among the many causes of this non-uniform distribution. We demonstrate a simple technique that produces <it>P </it>values that are close to uniform for nondifferentially expressed genes in control datasets.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010011">Immunology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Microarray data typically involve tens of thousands of genes but only a handful of replicates. It is therefore difficult to establish appropriate <it>P </it>value thresholds for significance. For example, consider the response of 40,000 genes to two different experimental conditions, say diseased and healthy tissue. If a significance level of <it>P </it>&lt; 0.05 is chosen, then one would expect an unacceptable number (2,000 [40,000 &#215; 0.05]) of false positives. A conceptually simple procedure, the Bonferroni correction, would set a threshold of <it>P </it>= 1.25 &#215; 10<sup>-6 </sup>(0.05/40,000). Using this <it>P </it>value as the threshold for significance, there is only a 0.05 chance of any false positives across all of the 40,000 comparisons between the two conditions. Such metrics are said to control the 'family-wise error rate'. Family-wise error rate is often assumed to be too conservative for microarray experiments, because there are often no results with <it>P </it>values below the threshold for the modest number of samples that make up most microarray experiments. Recently, 'false discovery rate' (FDR) was proposed as an alternative, more permissive approach to estimating significance of microarray experiments <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. This metric acknowledges that biologists are often able to tolerate some error in gene lists. For example, a FDR could be set at 10%, in which case a list of 100 genes would be expected to have as many as 10 false positives.</p>
         <p>No matter what threshold is used to control significance in microarray experiments, there is an inherent assumption that the <it>P </it>values of genes that are not differentially expressed follow a uniform distribution. For example, genes that are not differentially expressed should have a <it>P </it>value of 0.01 or smaller only 1% of the time. The uniform distribution of null <it>P </it>values seems like a safe assumption that is guaranteed by the laws of statistics. However, if for some reason this assumption is not met, then attempts to determine a threshold of significance may yield meaningless results <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>In this report we show that commonly used statistics can in fact generate distributions of <it>P </it>values for non-differentially expressed genes that are far from uniform. We demonstrate a simple method for producing <it>P </it>values that are much closer to the expected uniform distribution.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>RMA summation and quantile-quantile normalization suppress the pooled variance of each gene</p>
            </st>
            <p>Our central argument is that it is a rational choice to assume that, when comparing two conditions, the pooled variance of each gene on the array is approximately constant. If this assumption is true, then the distribution of scores under a <it>t </it>test or variant approaches the normal distribution. We begin our assertion that this assumption is reasonable by examining a control dataset released by Affymetrix. The Affymetrix HG-U133A Latin Square dataset consists of 14 'experiments' (labeled 1 to 14), each with three replicates. Each of the 14 experiments contains 42 genes that are spiked in at known concentrations against a constant background of human RNA. Of the approximately 22,000 genes on the chip, the only ones that should be different when comparing across experiments are the 42 genes that were spiked in at different concentrations. We shall refer to genes that were not spiked in as null genes, because the null hypothesis of equal expression in all conditions is true for these genes.</p>
            <p>For two experimental conditions with sample sizes in each condition n<sub>1 </sub>and n<sub>2</sub>, we have our usual definition of a <it>t </it>test assuming equal variance:</p>
            <p>
               <display-formula id="M1">
                  <m:math name="gb-2007-8-5-r69-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>t</m:mi>
                           <m:mi/>
                           <m:mo>=</m:mo>
                           <m:mi/>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>x</m:mi>
                                       <m:mo>&#175;</m:mo>
                                    </m:mover>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>x</m:mi>
                                       <m:mo>&#175;</m:mo>
                                    </m:mover>
                                    <m:mn>2</m:mn>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:msqrt>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:msqrt>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaacaWF0bqefeKCPfgBaGabaiaa+bcaiiaacqqF9aqpcaGFGaWaaSaaaeaaceWF4bGbaebadaWgaaWcbaGaa8xmaaqabaGccqqFsislceWF4bGbaebadaWgaaWcbaGaa8NmaaqabaaakeaadaGcaaqaaGGaciab8n8aZnaaCaaaleqabaGaa8Nmaaaaaeqaaaaaaaa@401B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>
               <display-formula id="M2">
                  <m:math name="gb-2007-8-5-r69-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msup>
                              <m:mi>&#963;</m:mi>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>j</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mn>1</m:mn>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>x</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>x</m:mi>
                                                   <m:mo>&#175;</m:mo>
                                                </m:mover>
                                                <m:mn>1</m:mn>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                                 <m:mo>+</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>j</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mn>2</m:mn>
                                          </m:msub>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>x</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mover accent="true">
                                                   <m:mi>x</m:mi>
                                                   <m:mo>&#175;</m:mo>
                                                </m:mover>
                                                <m:mn>2</m:mn>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>n</m:mi>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mi/>
                                 <m:mo>+</m:mo>
                                 <m:mi/>
                                 <m:msub>
                                    <m:mi>n</m:mi>
                                    <m:mn>2</m:mn>
                                 </m:msub>
                                 <m:mi/>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>2</m:mn>
                              </m:mrow>
                           </m:mfrac>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mn>1</m:mn>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>n</m:mi>
                                          <m:mn>1</m:mn>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                                 <m:mo>+</m:mo>
                                 <m:mfrac>
                                    <m:mn>1</m:mn>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>n</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaiiGacqWFdpWCdaahaaWcbeqaaGqaaiaa+jdaaaGccqGH9aqpdaWcaaqaamaaqahabaaccaGae0hkaGIaa4hEamaaBaaaleaacaGFQbaabeaakiabgkHiTiqa+HhagaqeamaaBaaaleaacaGFXaaabeaakiab9LcaPmaaCaaaleqabaGaa4NmaaaaaeaacaGFQbGaeyypa0Jaa4xmaaqaaiaa+5gadaWgaaadbaGaa4xmaaqabaaaniabggHiLdGccqGHRaWkdaaeWbqaaiab9HcaOiaa+HhadaWgaaWcbaGaa4NAaaqabaGccqGHsislceGF4bGbaebadaWgaaWcbaGaa4NmaaqabaGccqqFPaqkdaahaaWcbeqaaiaa+jdaaaaabaGaa4NAaiabg2da9iaa+fdaaeaacaGFUbWaaSbaaWqaaiaa+jdaaeqaaaqdcqGHris5aaGcbaGaa4NBamaaBaaaleaacaGFXaaabeaaruqqYLwySbaceaGccaaFGaGae03kaSIaaWhiaiaa+5gadaWgaaWcbaGaa4NmaaqabaGccaaFGaGaeyOeI0Iaa4NmaaaadaqadaqaamaalaaabaGaa4xmaaqaaiaa+5gadaWgaaWcbaGaa4xmaaqabaaaaOGaey4kaSYaaSaaaeaacaGFXaaabaGaa4NBamaaBaaaleaacaGFYaaabeaaaaaakiaawIcacaGLPaaaaaa@65EF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Affymetrix microarrays have multiple 25-mer probes for each gene on the chip. In the Latin Square dataset, there are about 500,000 25-mer probes. These probes are organized into probesets that target about 22,000 genes. Because there are multiple probes in each probeset, we do not expect all the probes to act independently of one another. Nonetheless, in order to examine the distribution of variances on a microarray, it is informative to begin our analysis at the probe level. Figure <figr fid="F1">1a</figr> shows the pooled error (<it>&#963;</it><sup>2 </sup>from Equation 2) as a function of the mean difference (<inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> from Equation 1) of the approximately 500,000 probes from probesets that represent null (not spiked in) genes from the comparison between Latin Square experiments 1 and 2. In this case, there are three chips in each condition so n<sub>1 </sub>= n<sub>2 </sub>= 3. To make this figure consistent with the data shown in the rest of this report, all of the data from all arrays in Figure <figr fid="F1">1a</figr> were log<sub>2 </sub>transformed before calculation of <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sup>2</sup>. We would expect, based on previous literature, a relationship to exist between probe intensity and probe variance on microarrays <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. We see in Figure <figr fid="F1">1a</figr> that such a relationship does in fact exist and that <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sup>2 </sup>are not independent at the probe level.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Standard error as a function of the difference in means</p>
               </caption>
               <text>
                  <p>Standard error as a function of the difference in means. Shown is <it>&#963;</it><sup>2 </sup>as a function of <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> (see Equation 1 in the text) for probes from null genes from the comparison of Latin Square experiments 1 and 2 for <b>(a) </b>all approximately 500,000 probes on the array, <b>(b) </b>approximately 22,000 probes after RMA summation but in the absence of quantile-quantile normalization, and <b>(c) </b>after background correction, quantile-quantile normalization, and RMA summation. A small number of outlying data points are excluded from each panel. RMA, Robust Multichip Average.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-1"/>
            </fig>
            <p>We argue in our report that <it>&#963;</it><sup>2 </sup>can be thought of as approximately constant. This is clearly not true at the probe level in Figure <figr fid="F1">1a</figr>. Microarray analysis, however, is usually not performed directly at the probe level. For many microarray experiments, the desired analysis is at the gene level. A well studied problem in the analysis of Affymetrix arrays is how best to summarize the multiple probes in a probeset to produce a single value for each gene on each chip <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. All of the probeset data in this report were generated with the log<sub>2</sub>-transformed Robust Multichip Average (RMA) summary statistic <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, which is a well regarded and robust measurement that has been shown to work well in a variety of conditions <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. After transformation with the RMA statistic, our data can be represented as a single spreadsheet or matrix in which the columns represent experiments and the rows represent genes.</p>
            <p>Figure <figr fid="F1">1b</figr> shows <it>&#963;</it><sup>2 </sup>as a function of <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> for the approximately 22,000 probesets generated by the comparison of Latin Square experiments 1 and 2 after RMA summation. We note immediately that RMA summation suppresses the standard error. The values for probeset <it>&#963;</it><sup>2 </sup>in Figure <figr fid="F1">1b</figr> are on the order of 10 to 20 times smaller than the probe <it>&#963;</it><sup>2 </sup>observed in Figure <figr fid="F1">1a</figr>. In addition, we can tell by immediate inspection that the estimates of <it>&#963;</it><sup>2 </sup>in Figure <figr fid="F1">1b</figr> must contain errors because they are not symmetrical. The data in Figure <figr fid="F1">1</figr> are from null (not spiked in) genes. The expected value of <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> is therefore zero and there is no reason to believe that <it>&#963;</it><sup>2 </sup>should deviate from symmetry around zero. Clearly, in Figure <figr fid="F1">1b</figr>, however, there is a strong tendency for <it>&#963;</it><sup>2 </sup>to be larger when <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> exceeds zero. This must be due to some systematic error in the underlying data. RMA summation is usually accompanied by quantile-quantile normalization <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, which is designed to correct for systematic errors in microarray data. Figure <figr fid="F1">1c</figr> shows the relationship between <it>&#963;</it><sup>2 </sup>and <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> after both quantile-quantile normalization and RMA summation. We see that after quantile-quantile normalization, the standard error approaches a constant across the range of <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> scores. In the following section we show that the deviations from a constant value of <it>&#963;</it><sup>2 </sup>that remain after normalization and RMA summation are likely to contain errors because, even on normalized data, test statistics work better if they assume that <it>&#963;</it><sup>2 </sup>is constant.</p>
         </sec>
         <sec>
            <st>
               <p>The measured standard error either before or after quantile-quantile normalization is unreliable</p>
            </st>
            <p>In order to produce a reliable list of differentially expressed genes between two experimental conditions, we need a test statistic and an appropriate way to produce <it>P </it>values from that test statistic. It has recently become clear that the standard <it>t </it>test (Equations 1 and 2) has serious shortcomings as a test statistic for microarray data <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. There has been a great deal of recent interest in test statistics that ignore or 'shrink' the variance of each individual gene. For example, a popular alternative to the standard <it>t </it>test is the cyber <it>t </it>test <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, which uses Bayesian statistics to weight the variance of each individual gene with the variance of other genes on the array with similar intensities (see Materials and methods, below). In addition to the cyber <it>t </it>test, we can follow Allison and coworkers <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and describe a universe of possible test statistics with which to evaluate the null hypothesis that the expression of a given gene is the same in conditions 1 and 2:</p>
            <p>
               <display-formula id="M3">
                  <m:math name="gb-2007-8-5-r69-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>x</m:mi>
                                       <m:mo>&#175;</m:mo>
                                    </m:mover>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>x</m:mi>
                                       <m:mo>&#175;</m:mo>
                                    </m:mover>
                                    <m:mn>2</m:mn>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:msqrt>
                                    <m:mrow>
                                       <m:mi>B</m:mi>
                                       <m:msup>
                                          <m:mi>&#952;</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                       <m:mo>+</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>B</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:msup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:msqrt>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaWcaaqaaGqaaiqa=HhagaqeamaaBaaaleaacaaIXaaabeaakiabgkHiTiqa=HhagaqeamaaBaaaleaacaaIYaaabeaaaOqaamaakaaabaGaa8NqaiabeI7aXnaaCaaaleqabaGaaGOmaaaakiabgUcaRiaacIcacaaIXaGaeyOeI0Iaa8NqaiaacMcaiiGacqGFdpWCdaahaaWcbeqaaiaaikdaaaaabeaaaaaaaa@430C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Here, <it>&#963;</it><sup>2 </sup>is the estimate of standard error for each gene, as in the denominator for the <it>t </it>statistic in Equation 1. On the other hand, <it>&#952;</it><sup>2 </sup>is an estimate of the standard error of every gene on the array. We take as our <it>&#952;</it><sup>2 </sup>simply the average of all &#963;<sup>2 </sup>values. That is, if there are <it>N </it>genes on the array, then:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2007-8-5-r69-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msup>
                              <m:mi>&#952;</m:mi>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mi>i</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                    </m:mrow>
                                    <m:mi>N</m:mi>
                                 </m:mfrac>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaacqaH4oqCdaahaaWcbeqaaiaaikdaaaGccqGH9aqpdaaeWbqaamaalaaabaacciGae83Wdm3aa0baaSqaaGqaaiaa+LgaaeaacaaIYaaaaaGcbaGaa4NtaaaaaSqaaiaa+LgacqGH9aqpcaaIXaaabaGaa4NtaaqdcqGHris5aaaa@4123@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The shrinkage factor, B, can vary between 0 and 1 in Equation 3. When B = 0, Equation 3 reduces to the standard <it>t </it>test of Equation 1. When B = 1, the statistic essentially ignores the variance, in that it reduces to assigning a score based only on the average difference between the genes divided by a constant.</p>
            <p>The consequences of choosing different summary statistics are shown in Figure <figr fid="F2">2</figr>. A receiver operating characteristics (ROC) graph is shown in Figure <figr fid="F2">2a</figr>, in which we use different statistics to rank the most differentially expressed genes in Latin Square experiment 8 versus experiment 9. To generate an ROC curve for each statistic, we assign a score to each gene on the chip and sort the resulting list. For each gene in the sorted list we ask, if the threshold for significance were set to include only the genes with scores equal to or greater than the current gene, then how many true positives and false positives would be captured? An algorithm capable of perfectly separating true and false positives would generate a curve that would include a point in the upper left corner of Figure <figr fid="F2">2a</figr>, because there would exist a threshold cutoff in which all 42 spiked-in genes would be captured and all approximately 22,000 null genes would be excluded. We see in Figure <figr fid="F2">2a</figr> that the standard <it>t </it>test performs poorly whereas the cyber <it>t </it>test does well, as does the statistic defined by Equation 3 with B = 1.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The performance of test statistics in ranking the Latin Square data</p>
               </caption>
               <text>
                  <p>The performance of test statistics in ranking the Latin Square data. <b>(a) </b>ROC curves for Latin Square experiments 8 versus 9. <b>(b,c) </b>The number of true positives captured for all 14 2&#215; Latin Square experiments at a threshold that also captured four false positives (dashed line in panel a) in the absence (panel b) and presence (panel c) of background correction and quantile-quantile normalization. B refers to the 'shrinkage factor' in Equation 3 (see text). For this and the following figures in the report, data were summarized with RMA before application of the test statistic. RMA, Robust Multichip Average; ROC, receiver operating characteristic.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-2"/>
            </fig>
            <p>To explore the effects of variance shrinkage and normalization on statistic performance across multiple Latin Square comparisons, we choose an arbitrary threshold; we consider how many true positives are captured by each statistic for a threshold cutoff that also captures four false positives (Figure <figr fid="F2">2a</figr>, dashed vertical line). The box plots in Figure <figr fid="F2">2</figr> show this value for each statistic over the 14 Latin Square experiments in which the spiked-in ratios differ by a factor of two in the absence (Figure <figr fid="F2">2b</figr>) and presence (Figure <figr fid="F2">2c</figr>) of quantile-quantile normalization. We note that whether one uses Bayesian statistic to weigh the variance of each gene (as in the cyber <it>t </it>test) or shrinks the standard error according to Equation 3 (with B approaching 1), much better performance is achieved than with the standard <it>t </it>test, regardless of normalization schemes. This suggests that both before and after quantile-quantile normalization, the variance reported for each gene is unreliable.</p>
            <p>In Figure <figr fid="F2">2</figr>, the B = 1 form of Equation 3 performs nearly the same in the absence (Figure <figr fid="F2">2b</figr>) and presence (Figure <figr fid="F2">2c</figr>) of quantile-quantile normalization. In contrast, the standard <it>t </it>test performs much better under quantile-quantile normalization (Figure <figr fid="F2">2c</figr>) than with un-normalized data (Figure <figr fid="F2">2b</figr>). This improvement must occur because either <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> or <it>&#963;</it><sup>2</sup>, or both, improve after normalization. Figure <figr fid="F3">3</figr> shows the relationship between <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sup>2 </sup>before (Figure <figr fid="F3">3a</figr>) and after (Figure <figr fid="F3">3b</figr>) quantile-quantile normalization for the comparison of experiments 1 and 2 in the Latin Square dataset. We see that <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> is perturbed much less than <it>&#963;</it><sup>2 </sup>by normalization. The fact that the standard <it>t </it>test improves after normalization (Figure <figr fid="F2">2</figr>), however, suggests that the <it>&#963;</it><sup>2 </sup>values after normalization are more appropriate. This is something of a paradox. How can a transformation that discards about 90% of the original estimates for <it>&#963;</it><sup>2 </sup>improve performance? We argue that the resolution to this apparent paradox is that the original estimates of <it>&#963;</it><sup>2 </sup>after RMA summation are highly unreliable. Quantile-quantile normalization replaces the original estimates of <it>&#963;</it><sup>2 </sup>with values that approach a constant (Figure <figr fid="F1">1c</figr>). This improves the performance of the standard <it>t </it>test (Figure <figr fid="F2">2</figr>). That is, quantile-quantile normalization suppresses the original measured variance and therefore allows the standard <it>t </it>test to move closer to the performance of algorithms, such as cyber <it>t </it>test and the B = 1 form of Equation 1, that suppress the importance of the original variance regardless of normalization schemes.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Estimates of standard error do not survive quantile-quantile normalization</p>
               </caption>
               <text>
                  <p>Estimates of standard error do not survive quantile-quantile normalization. <b>(a,b) </b><inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> (panel a) and <it>&#963;</it><sup>2 </sup>(panel b) before and after background correction and quantile-quantile normalization for the comparison of the Latin Square experiments 1 and 2. Fits shown are to a linear regression. <b>(c) </b>Box plot showing the R<sup>2 </sup>values from a linear fit for all 14 2&#215; Latin Square comparisons.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Different analysis schemes yield very different distributions of <it>P </it>values</p>
            </st>
            <p>We have argued that quantile-quantile normalization is effective in part because it replaces the unreliable estimates of <it>&#963;</it><sup>2 </sup>with a distribution that approaches a constant (Figures <figr fid="F1">1</figr> and <figr fid="F3">3</figr>) and that, furthermore, test statistics appear to work better when they assume that <it>&#963;</it><sup>2 </sup>approaches a constant (Figure <figr fid="F2">2</figr>). We now turn to the issue of how we can utilize this assumption of constant standard error to produce more accurate estimates of <it>P </it>values.</p>
            <p>If the assumptions of normality, equal variance, and independence were met, then we would of course expect the standard <it>t </it>test in Equation 1 to follow a <it>t </it>distribution with appropriate degrees of freedom for null genes. If any of these assumptions are violated, however, then the distribution of standard <it>t </it>scores may not follow a <it>t </it>distribution. We can examine how well these assumptions are met for the standard <it>t </it>test by using the <it>t </it>distribution to produce <it>P </it>values for null genes. If all the assumptions are met, then the <it>P </it>values produced from the <it>t </it>distribution should follow a uniform distribution. Figure <figr fid="F4">4a</figr> (blue lines) shows that the <it>P </it>values produced by the <it>t </it>distribution for the standard <it>t </it>test (with four degrees of freedom, because n<sub>10 </sub>= n<sub>11 </sub>= 3) compared with the expected <it>P </it>values under a uniform distribution for the comparison of Latin Square experiments 10 versus 11 after RMA summation and quantile-quantile normalization. We see that the actual distribution of <it>P </it>values produced by the standard <it>t </it>test deviates considerably from the expected <it>P </it>values. Clearly, one or more of the assumptions of the standard <it>t </it>test is violated in this case.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Actual versus expected <it>P </it>values under uniform distribution for null genes of Latin Square 2&#215; comparisons</p>
               </caption>
               <text>
                  <p>Actual versus expected <it>P </it>values under uniform distribution for null genes of Latin Square 2&#215; comparisons. <b>(a) </b>Actual versus expected <it>P </it>values for the comparison of experiment 10 versus 11. Black dashes indicate the y = x diagonal. <b>(b) </b>Box plots showing the results of the Kolmogorov-Smirnov test for each of the 14 Latin Square 2&#215; comparisons under the null hypothesis that the observed distribution of <it>P </it>values was the same as a uniform distribution of <it>P </it>values. The red line is the <it>P </it>= 0.05 level. <b>(c) </b>Same data as in panel b but with a magnified y-axis.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-4"/>
            </fig>
            <p>Given the poor performance of the standard <it>t </it>test in ranking differentially expressed genes (Figure <figr fid="F2">2</figr>), it is perhaps not surprising that the <it>P </it>values generated by the standard <it>t </it>test fall so far from uniform. Does the cyber <it>t </it>test, which clearly outperforms the standard <it>t </it>test in ranking differentially expressed genes (Figure <figr fid="F2">2</figr>), produce <it>P </it>values closer to a uniform distribution? Rather than determining <it>&#963;</it><sup>2 </sup>independently for each gene, the cyber <it>t </it>test uses Bayesian statistics to weigh the variance of each gene by the variance of genes on the array with similar intensities. Because the estimate for the variance of each gene is not independent, the authors of the cyber <it>t </it>test do not expect the cyber <it>t </it>test to follow a simple <it>t </it>distribution with n - 2 degrees of freedom. Indeed, the <it>P </it>values reported by the R implementation of the cyber <it>t </it>test that we used are generated with an assumption of 22 degrees of freedom, given three experiments in each condition and the default parameters (see Materials and methods, below). Figure <figr fid="F4">4a</figr> (black lines) presents the <it>P </it>values reported by the R implementation of the cyber <it>t </it>test. We see that, despite the correction for lack of independence by increasing the number of degrees of freedom, the <it>P </it>values reported by the cyber <it>t </it>test are also poorly described by a uniform distribution.</p>
            <p>If the cyber <it>t </it>test does not appear to follow a <it>t </it>distribution, then can we find a more appropriate distribution that it does follow? In Figure <figr fid="F1">1c</figr>, we have seen that <it>&#963;</it><sup>2 </sup>approaches a constant for null genes after RMA summation and quantile-quantile normalization. The cyber <it>t </it>test estimates the prior variance of each gene as a function of that gene's intensity. After RMA summation and quantile-quantile normalization, that prior variance should be close to constant. Because in the Latin Square dataset we have small sample sizes, the Bayesian cyber <it>t </it>estimate gives a good deal of weight to the prior variance, and therefore the cyber <it>t </it>estimate of variance for each gene will also approach a constant. As a distribution approaches <inline-formula><m:math name="gb-2007-8-5-r69-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>1</m:mn></m:msub><m:mo>&#8722;</m:mo><m:msub><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#175;</m:mo></m:mover><m:mn>2</m:mn></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaaceWF4bGbaebadaWgaaWcbaGaaGymaaqabaGccqGHsislceWF4bGbaebadaWgaaWcbaGaaGOmaaqabaaaaa@381E@</m:annotation></m:semantics></m:math></inline-formula> divided by a constant, it will become normally distributed. We might anticipate, therefore, that the distribution of all cyber <it>t </it>scores should approach a normal distribution.</p>
            <p>We can check the validity of the above line of reasoning by generating <it>P </it>values for the cyber <it>t </it>scores under the assumption that they are normally distributed. For the comparison of null genes between Latin Square experiments 10 and 11, we calculate the mean (<inline-formula><m:math name="gb-2007-8-5-r69-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:mi>c</m:mi><m:mi>y</m:mi><m:mi>b</m:mi><m:mi>e</m:mi><m:mi>r</m:mi><m:mi>T</m:mi></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaqdaaqaaGqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaaaaaaa@38B6@</m:annotation></m:semantics></m:math></inline-formula>) and standard deviation (<it>&#963;</it><sub>cyberT</sub>) of all the cyber <it>t </it>scores. We can then easily calculate the <it>P </it>value from the cumulative distribution function (cdf) of the standard normal distribution for each cyberT score as follows:</p>
            <p>
               <display-formula id="M4">
                  <m:math name="gb-2007-8-5-r69-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>c</m:mi>
                           <m:mi>y</m:mi>
                           <m:mi>b</m:mi>
                           <m:mi>e</m:mi>
                           <m:mi>r</m:mi>
                           <m:mi>T</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>*</m:mo>
                           <m:mi>c</m:mi>
                           <m:mi>d</m:mi>
                           <m:mi>f</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mo>|</m:mo>
                                 <m:mi>c</m:mi>
                                 <m:mi>y</m:mi>
                                 <m:mi>b</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>r</m:mi>
                                 <m:mi>T</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mover accent="true">
                                    <m:mrow>
                                       <m:mi>c</m:mi>
                                       <m:mi>y</m:mi>
                                       <m:mi>b</m:mi>
                                       <m:mi>e</m:mi>
                                       <m:mi>r</m:mi>
                                       <m:mi>T</m:mi>
                                    </m:mrow>
                                    <m:mo stretchy="true">&#175;</m:mo>
                                 </m:mover>
                                 <m:mo>|</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>&#963;</m:mi>
                                    <m:mrow>
                                       <m:mi>c</m:mi>
                                       <m:mi>y</m:mi>
                                       <m:mi>b</m:mi>
                                       <m:mi>e</m:mi>
                                       <m:mi>r</m:mi>
                                       <m:mi>T</m:mi>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaacaWFWbGaaiikaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaiaacMcacqGH9aqpcaaIYaGaaiOkaiaa=ngacaWFKbGaa8NzaiaacIcadaWcaaqaaiaacYhacaWFJbGaa8xEaiaa=jgacaWFLbGaa8NCaiaa=rfacqGHsisldaqdaaqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaaaacaGG8baabaGaeq4Wdm3aaSbaaSqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaaqabaaaaOGaaiykaaaa@56BC@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Figure <figr fid="F4">4a</figr> (red line) shows that the <it>P </it>values produced by the normal distribution of Equation 4 fall very close to a uniform distribution. This provides strong evidence that our assertion that the cyber <it>t </it>estimate of <it>&#963;</it><sup>2 </sup>is approximately constant is reasonable. For the rest of this report, we refer to the method of generating <it>P </it>values from the cyber <it>t </it>test by assuming a normal distribution as 'cyber-<it>t</it>-Normal'. We emphasize that there are two differences between <it>P </it>values produced by cyber-<it>t</it>-Normal and the <it>P </it>values reported by the cyber <it>t </it>test. One is that we assume a normal distribution rather than a <it>t </it>distribution. The other is that we calculate the <it>P </it>value for each gene by comparison with a distribution of all genes on the array. That is, we assume that all the genes on the array follow a single distribution whereas the <it>P </it>values produced by the cyber <it>t </it>test are generated under the assumption that each gene follows its own independent distribution based on the Bayesian estimate of <it>&#963;</it><sup>2 </sup>for that gene.</p>
            <p>How close are the <it>P </it>values produced by the cyber-<it>t</it>-Normal scheme to a uniform distribution? We can use the Kolmogorov-Smirnov test to evaluate the null hypothesis that the distribution of <it>P </it>values from each statistic is identical to the uniform distribution of <it>P </it>values. The Kolmogorov-Smirnov test is a nonparametric test and can therefore suffer from low power. On the other hand, we are using the test to evaluate a distribution with over 22,000 data points, and so we are confident that even small deviations from our assumptions will produce small <it>P </it>values. Figure <figr fid="F4">4b,c</figr> shows the -log<sub>10 </sub>of the <it>P </it>value of the Kolmogorov-Smirnov test for all 14 possible 2&#215; Latin Square comparisons. We see that, although there is considerable variability across all 14 pairs of experiments, <it>P </it>values produced by the cyber-<it>t</it>-Normal method are a good deal closer to uniform than <it>P </it>values produced by either the standard <it>t </it>or cyber <it>t </it>methods.</p>
         </sec>
         <sec>
            <st>
               <p>Imperfect normalization contributes to deviations from a perfectly normal distribution</p>
            </st>
            <p>The red lines in Figure <figr fid="F4">4b,c</figr> represent a <it>P </it>value of 0.05 for the null hypothesis that a statistic produces <it>P </it>values that are uniform. Figure <figr fid="F4">4c</figr> contains the same data as Figure <figr fid="F4">4b</figr> with a magnified scale. We see that even though the cyber-<it>t</it>-Normal method produces <it>P </it>values that are a good deal closer to uniform than the other methods, there is still significant deviation from a perfectly uniform distribution. One possible explanation for this deviation is imperfect normalization from the quantile-quantile procedure. The top panels in Figure <figr fid="F5">5</figr> show cyber <it>t </it>scores after RMA summation in the presence (top right panel) and absence (top left panel) of quantile-quantile normalization for the null genes for a comparison of Latin Square experiments 8 and 9. We see that even after quantile-quantile normalization, there remain systematic differences in the null genes (top right panel). Such systematic differences even after normalization have been observed in other datasets <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. To correct for these differences, we can perform an additional normalization, which we call 'statistics-level normalization'. To do this, we simply fit a local (Loess) regression line to the data in the top panels of Figure <figr fid="F5">5</figr> with a window size of 1,000 data points. We then subtract from each gene the value for that gene from the Loess regression line. The results of this subtraction are shown in the bottom panels of Figure <figr fid="F5">5</figr>. We see in Figure <figr fid="F4">4b,c</figr> that when we perform this additional normalization, the <it>P </it>values produced by cyber-<it>t</it>-Normal become slightly closer to uniform. For the rest of the report, we refer to the calculation of <it>P </it>values by cyber-<it>t</it>-Normal after RMA summation, quantile-quantile normalization, and statistic-level normalization as 'scheme 4'.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Fitting data to a local regression removes systematic variations present after quantile-quantile normalization</p>
               </caption>
               <text>
                  <p>Fitting data to a local regression removes systematic variations present after quantile-quantile normalization. Shown is a comparison of cyber <it>t </it>scores for the null genes of the Latin Square comparison of experiment 8 versus 9 in the presence and absence of quantile-quantile and statistics level normalization (see text). Red lines are Lowess regression lines with a window size of 1,000.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Cross-hybridization also contributes to deviations from a perfectly normal distribution</p>
            </st>
            <p>Another possible cause of deviations from the normal distribution in Figure <figr fid="F4">4</figr> is 'off-target' or cross-hybridization. We might expect that some probe sets respond to changes in genes other than those that they were designed to detect. If genes that are annotated as null are in fact responding to changes in spiked-in genes, this would cause <it>P </it>values to be smaller than expected under a uniform distribution. We can examine the effect of cross-hybridization by taking advantage of the experimental design of the Latin Square dataset. For each of the 91 possible pairs of experiments in the Latin Square dataset, we can compute the average difference between spike-in concentrations. That is, if the spike-in concentrations for the 42 genes in experiment X are ([X<sub>1</sub>], [X<sub>2</sub>], [X<sub>3</sub>] ... [X<sub>42</sub>]) and for experiment Y are ([Y<sub>1</sub>], [Y<sub>2</sub>], [Y<sub>3</sub>] ... [Y<sub>42</sub>]), then we define the average difference in concentration as follows:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2007-8-5-r69-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mn>42</m:mn>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>|</m:mo>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>X</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>Y</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mo>|</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>42</m:mn>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaWcaaqaamaaqahabaWaaqWaaeaaieaacaWFybWaaSbaaSqaaiaa=LgaaeqaaOGaeyOeI0Iaa8xwamaaBaaaleaacaWFPbaabeaaaOGaay5bSlaawIa7aaWcbaGaa8xAaiabg2da9iaaigdaaeaacaaI0aGaaGOmaaqdcqGHris5aaGcbaGaaGinaiaaikdaaaaaaa@4333@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Figure <figr fid="F6">6</figr> shows the -log<sub>10 </sub>(pValues) from the Kolmogorov-Smirnov test as a function of this average difference in spike-in concentration. As in Figure <figr fid="F4">4</figr>, the Kolmogorov-Smirnov test evaluates the null hypothesis that the distribution of <it>P </it>values produced by each statistic follows a uniform distribution. As we go from left to right on the x axis, we find experiments in which the arrays were exposed to greater differences in RNA concentrations. The data in this figure were constructed from a dataset containing only null genes. Despite the fact that the spike-in genes are removed from this dataset, we see an increase in the deviation from a uniform distribution as spike-in concentration increases. This must be due to nonspecific hybridization. That is, probes that target null genes are responding to changes in the spiked-in genes. Because even in the 2&#215; comparisons, the chips in the two conditions are exposed to some differences in RNA, we can explain some of the deviations from the normal distribution in Figure <figr fid="F4">4</figr> by cross-hybridization.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Cross-hybridization distorts <it>P </it>values for null genes in the Latin Square dataset</p>
               </caption>
               <text>
                  <p>Cross-hybridization distorts <it>P </it>values for null genes in the Latin Square dataset. Shown are the results of the Kolmogorov-Smirnov test for the null genes for all 91 Latin Square comparisons as a function of the average difference in spike concentration (see text). The null hypothesis for the Kolmogorov-Smirnov test is that the observed <it>P </it>values are identical to a uniform distribution. Error bars are standard errors. The red line is the <it>P </it>= 0.05 level.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Experiments consisting of technical replicates are closer to a normal distribution</p>
            </st>
            <p>Technical replicates consist of arrays that have been exposed to identical RNA. Every gene within a comparison of technical replicates is therefore a null gene. If some of the deviation from a uniform distribution in Figure <figr fid="F4">4</figr> were caused by cross-hybridization, then we would anticipate that experiments consisting entirely of technical replicates would be closer to a uniform distribution. The sample sizes in the Latin Square experiment shown in Figure <figr fid="F4">4</figr> are <it>n </it>= 3 for each condition, however, which does not allow for comparison within an experimental condition by either the cyber <it>t </it>or standard <it>t </it>test. Fortunately, a dataset with six technical replicates has been published <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This dataset, which was designed to measure the effect of different RNA amplification schemes, consists of six technical replicates in each of four distinct groups for a total of 24 arrays. Within each of the four groups, there are 10 possible ways to split the six technical replicates into two groups of three. There are therefore a total of 40 distinct comparisons of technical replicates with n<sub>1 </sub>= n<sub>2 </sub>= 3 within the 24 arrays of this dataset.</p>
            <p>For each of these 40 possible <it>n </it>= 3 versus <it>n </it>= 3 comparisons of technical replicates, we used the Kolmogorov-Smirnov test to evaluate the null hypothesis that the <it>P </it>values produced by various schemes were identical to the uniform distribution of <it>P </it>values. The box plots in Figure <figr fid="F7">7</figr> show the results of this calculation. Figure <figr fid="F7">7b</figr> is identical to Figure <figr fid="F7">7a</figr> except the y axis has been magnified. We see that for more than half of the 40 comparisons under 'scheme 4' there is no statistical difference between the generated <it>P </it>values and the uniform distribution at a <it>P </it>value cutoff of 0.05. The fact that the distribution of <it>P </it>values produced by 'scheme 4' for these technical replicates is closer to a uniform distribution than for the null genes of the 2&#215; Latin Square experiments in Figure <figr fid="F4">4</figr> suggests that some of the deviation from a uniform distribution in Figure <figr fid="F4">4</figr> is caused by cross-hybridization.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Actual versus expected <it>P </it>values for a technical replicate dataset</p>
               </caption>
               <text>
                  <p>Actual versus expected <it>P </it>values for a technical replicate dataset. Shown are results of the Kolmogorov-Smirnov test for all 40 possible <it>n </it>= 3 versus <it>n </it>= 3 combinations of the technical replicates from the dataset of Cope and coworkers [16]. The null hypothesis for the Kolmogorov-Smirnov test is that the observed <it>P </it>values are identical to a uniform distribution. The red line is the <it>P </it>= 0.05 level. <b>(a,b) </b>The same data are shown in both panels but panel b has a magnified y-axis.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>'Scheme 4' should be conservative in real experiments</p>
            </st>
            <p>The graphs in Figures <figr fid="F4">4</figr> to <figr fid="F7">7</figr> were created using data from only null genes, which we know are not differentially expressed. In 'real' experiments, of course, we will have a mixture of null and not-null genes and we will not know which genes are null and which are differentially expressed. When we compare genes in two conditions, we assume that null genes will follow a normal distribution of scores whereas genes that are not null will not follow this same distribution. Because the majority of genes are probably null, the overall distribution of scores from a test statistic will largely reflect null genes. We measure the significance of genes as deviations from this background distribution of presumably null genes. Of course, not all of the genes will be null, and we will therefore not be able to measure <inline-formula><m:math name="gb-2007-8-5-r69-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:mi>c</m:mi><m:mi>y</m:mi><m:mi>b</m:mi><m:mi>e</m:mi><m:mi>r</m:mi><m:mi>T</m:mi><m:mi>N</m:mi><m:mi>u</m:mi><m:mi>l</m:mi><m:mi>l</m:mi><m:mi>s</m:mi></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaqdaaqaaGqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaiaa=5eacaWF1bGaa8hBaiaa=XgacaWFZbaaaaaa@3D49@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sub>cyberTNulls </sub>(the average and standard deviation of cyber <it>t </it>scores from null genes) but only <inline-formula><m:math name="gb-2007-8-5-r69-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:mi>c</m:mi><m:mi>y</m:mi><m:mi>b</m:mi><m:mi>e</m:mi><m:mi>r</m:mi><m:mi>T</m:mi><m:mi>A</m:mi><m:mi>l</m:mi><m:mi>l</m:mi></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaqdaaqaaGqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaiaa=feacaWFSbGaa8hBaaaaaaa@3B52@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sub>cyberTAll</sub>, which we define as the observed mean and standard deviation of cyber <it>t </it>scores for all genes.</p>
            <p>We would still expect, however, the number of upregulated genes to be approximately equal to the number of downregulated genes. We expect, therefore, that:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2007-8-5-r69-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>c</m:mi>
                                 <m:mi>y</m:mi>
                                 <m:mi>b</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>r</m:mi>
                                 <m:mi>T</m:mi>
                                 <m:mi>N</m:mi>
                                 <m:mi>u</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mi>s</m:mi>
                              </m:mrow>
                              <m:mo stretchy="true">&#175;</m:mo>
                           </m:mover>
                           <m:mo>&#8776;</m:mo>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>c</m:mi>
                                 <m:mi>y</m:mi>
                                 <m:mi>b</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>r</m:mi>
                                 <m:mi>T</m:mi>
                                 <m:mi>A</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mi>l</m:mi>
                              </m:mrow>
                              <m:mo stretchy="true">&#175;</m:mo>
                           </m:mover>
                           <m:mo>&#8776;</m:mo>
                           <m:mn>0</m:mn>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaqdaaqaaGqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaiaa=5eacaWF1bGaa8hBaiaa=XgacaWFZbaaaiabgIKi7oaanaaabaGaa83yaiaa=LhacaWFIbGaa8xzaiaa=jhacaWFubGaa8xqaiaa=XgacaWFSbaaaiabgIKi7kaaicdaaaa@4981@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Moreover, cyber <it>t </it>scores will be higher for not-null genes than for null genes, and we therefore expect:</p>
            <p><it>&#963;</it><sub>cyberTAll </sub>> <it>&#963;</it><sub>cyberTNulls</sub></p>
            <p>Estimates of <it>P </it>values generated by Equation 4 with <inline-formula><m:math name="gb-2007-8-5-r69-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:mi>c</m:mi><m:mi>y</m:mi><m:mi>b</m:mi><m:mi>e</m:mi><m:mi>r</m:mi><m:mi>T</m:mi><m:mi>A</m:mi><m:mi>l</m:mi><m:mi>l</m:mi></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaqdaaqaaGqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaiaa=feacaWFSbGaa8hBaaaaaaa@3B52@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sub>cyberTAll </sub>will therefore tend to be larger than <it>P </it>values that would be calculated with <inline-formula><m:math name="gb-2007-8-5-r69-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mover accent="true"><m:mrow><m:mi>c</m:mi><m:mi>y</m:mi><m:mi>b</m:mi><m:mi>e</m:mi><m:mi>r</m:mi><m:mi>T</m:mi><m:mi>N</m:mi><m:mi>u</m:mi><m:mi>l</m:mi><m:mi>l</m:mi></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaadaqdaaqaaGqaaiaa=ngacaWF5bGaa8Nyaiaa=vgacaWFYbGaa8hvaiaa=5eacaWF1bGaa8hBaiaa=Xgaaaaaaa@3C55@</m:annotation></m:semantics></m:math></inline-formula> and <it>&#963;</it><sub>cyberTNull</sub>s from only the null genes. As more and more genes are differentially expressed between two samples, conclusions based on the <it>P </it>values generated by Equation 4 should therefore become more conservative.</p>
         </sec>
         <sec>
            <st>
               <p>Scheme 4 has attractive sensitivity and specificity when controlling false discovery rate</p>
            </st>
            <p>In order to compile a list of genes that are differentially expressed between conditions, one requires not only a set of <it>P </it>values but also some way to set a significance threshold controlling for family-wise error rate or FDR. There are a large number of reasonable choices that one could make in determining a threshold for significance <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B11">11</abbr><abbr bid="B17">17</abbr></abbrgrp>. In this report, we choose to set a threshold for significance using the Benjamini and Hochberg algorithm <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, which is a simple and popular method for controlling FDR.</p>
            <p>Figure <figr fid="F8">8</figr> shows sensitivity and specificity for all 91 possible pair-wise comparisons in the Latin Square dataset at a FDR of 10%, as calculated using the Benjamini and Hochberg metric. We define sensitivity as the number of true positives recovered at the 10% FDR threshold divided by the total number of true positives in the Latin Square dataset. We define specificity as the number of true positives recovered at this threshold divided by the total number of genes recovered. At a 10% FDR, we expect a specificity of 0.9 or greater. We see that the <it>P </it>values generated by scheme 4 lead to appropriate balancing of sensitivity and specificity. For nearly all of the 91 comparisons, scheme 4 provides control of FDR at greater specificity than the expected 0.9, while maintaining an overall median sensitivity of about 0.9. In contrast, the <it>P </it>values generated using the standard <it>t </it>test and cyber <it>t </it>test lead to specificity that is considerably worse than the predicted FDR. We conclude that, at least for the Latin Square dataset, Benjamini and Hochberg control of FDR fails under standard <it>t </it>and cyber <it>t </it>but succeeds under scheme 4. These findings suggest that the <it>P </it>values produced by scheme 4 can lead to more appropriate cutoffs for gene lists than either the standard <it>t </it>or cyber <it>t </it>tests.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Sensitivity and specificity of different statistics for the Latin Square dataset</p>
               </caption>
               <text>
                  <p>Sensitivity and specificity of different statistics for the Latin Square dataset. Sensitivity and specificity using the Benjamini and Hochberg algorithm to control false discovery rate at 10% using the <it>P </it>values supplied by the various schemes for all 91 possible pair-wise comparisons in the Latin Square dataset.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-8"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>On biologic replicates, scheme 4 yields conservative, but reasonable, estimates of significant genes</p>
            </st>
            <p>To assess the performance of scheme 4 on real, as opposed to spike-in data, we here present a previously unpublished dataset involving isogenic biologic replicates of untransformed mouse cell lines derived from a single individual. In these experiments, two cell lines, myeloid (MY) and embryoid blast (EB), were exposed for 60 min to either dimethyl sulfoxide (DMSO) along or DMSO plus the chemotherapeutic agent etoposide at 50 &#956;mol/l. Cells were allowed to recover for either 4 hours or 24 hours. Five biological replicates (distinct plates of cells) in each condition were hybridized to the Mouse430_2 chip for a total of 40 experiments (two time points &#215; two experimental conditions &#215; two tissue types &#215; five biologic replicates). For each time point at each tissue type, we consider how many genes are differentially expressed when comparing the cells exposed to drug with control cells. Table <tblr tid="T1">1</tblr> shows the number of differentially expressed genes at a 10% FDR, as calculated in four different ways. For the standard <it>t </it>test, cyber <it>t </it>statistic, and scheme 4, we fed the <it>P </it>values generated by these tests into the Benjamini and Hochberg <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> FDR algorithm. For the significance analysis of microarrays (SAM) statistic <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, we used the implementation of SAM provided by TIGR mev (see Materials and methods, below). As might be expected based on the results from the Latin Square control dataset (Figure <figr fid="F5">5</figr>), we see in Table <tblr tid="T1">1</tblr> that the <it>P </it>values from scheme 4 lead to a much more conservative estimate of significance than do the <it>P </it>values from cyber <it>t </it>test or the standard <it>t </it>test.</p>
            <tbl id="T1" hint_layout="double">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Number of genes called significant at 10% false discovery rate on isogenic biologic replicates</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Statistic</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Embryoid blast</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Myeloid</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>4 hours</p>
                     </c>
                     <c ca="left">
                        <p>24 hours</p>
                     </c>
                     <c ca="left">
                        <p>4 hours</p>
                     </c>
                     <c ca="left">
                        <p>24 hours</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Standard <it>t </it>(scheme 1)</p>
                     </c>
                     <c ca="left">
                        <p>32</p>
                     </c>
                     <c ca="left">
                        <p>8,038</p>
                     </c>
                     <c ca="left">
                        <p>7,154</p>
                     </c>
                     <c ca="left">
                        <p>331</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cyber <it>t </it>(scheme 2)</p>
                     </c>
                     <c ca="left">
                        <p>1,288</p>
                     </c>
                     <c ca="left">
                        <p>9,769</p>
                     </c>
                     <c ca="left">
                        <p>10,349</p>
                     </c>
                     <c ca="left">
                        <p>4,464</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Scheme 4</p>
                     </c>
                     <c ca="left">
                        <p>90</p>
                     </c>
                     <c ca="left">
                        <p>87</p>
                     </c>
                     <c ca="left">
                        <p>268</p>
                     </c>
                     <c ca="left">
                        <p>38</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SAM</p>
                     </c>
                     <c ca="left">
                        <p>4,954</p>
                     </c>
                     <c ca="left">
                        <p>14,239</p>
                     </c>
                     <c ca="left">
                        <p>16,644</p>
                     </c>
                     <c ca="left">
                        <p>9,392</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Each entry in this table is the number of significant genes when comparing treatment (DMSO + drug) with control (just DMSO) for a cell type (embryoid blast or myeloid) and a time point (4 or 24 hours). All data were subject to quantile-quantile normalization and summarized using RMA. The Benjamini and Hochberg algorithm was used on <it>P </it>values generated by the standard <it>t </it>test, cyber <it>t</it>, and scheme 4 to calculate false discovery rate. The values for SAM were generated by the TIGR Multiple Array Viewer (see Materials and methods) with 100 permutations and the default parameters. DMSO, dimethyl sulfoxide; RMA, Robust Multichip Average; SAM, significance analysis of microarrays.</p>
               </tblfn>
            </tbl>
            <p>How reasonable are the various predictions of differentially expressed genes shown in Table <tblr tid="T1">1</tblr>? Of course, because this is not a 'spike in' dataset, we do not know how many genes were truly differentially expressed. Nonetheless, we can still make some assessment of how the various algorithms perform. Figure <figr fid="F9">9</figr> shows the average RMA score in treatment versus control for myeloid (MY) samples at 4 hours. The red symbols show the genes marked significant at 10% FDR under the standard <it>t </it>test (Figure <figr fid="F9">9b</figr>) and scheme 4 (Figure <figr fid="F9">9a</figr>). We note that the Pearson <it>r</it><sup>2 </sup>correlation between baseline and experiment averages in Figure <figr fid="F9">9</figr> is 0.991. Given the subtle sources of noise in a microarray experiment such as cross-hybridization (Figure <figr fid="F5">5</figr>) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and the tight correlation between baseline and experiment samples, our findings of 7,154 differentially expressed genes through the standard <it>t </it>route in Table <tblr tid="T1">1</tblr> seems unreasonable, as does the 10,349 genes found through <it>P </it>values reported by cyber t and the 16,644 genes found significant through the SAM analysis. We also note that in all four experimental conditions, there were no gross morphologic changes or obvious differences in growth between drug exposed and control groups (data not shown). If there really were many thousands of genes differentially expressed between drug and control groups, then one would expect to see large differences in the appearance and the behavior of the cells. The lack of such differences reinforces our argument that the more modest number of genes predicted to be differentially expressed by scheme 4 seems more reasonable than the results produced using other methods.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Positives at 10% FDR identified by scheme 4 and the standard <it>t </it>test</p>
               </caption>
               <text>
                  <p>Positives at 10% FDR identified by scheme 4 and the standard <it>t </it>test. Shown are the averages of RMA scores across the five replicates comparing baseline (DMSO only) and experiment (DMSO + drug) for myeloid cells 4 hours after treatment. The same data are shown in both panels. <b>(a,b) </b>Red symbols show genes called significant in Table 1 from (panel b) the standard <it>t </it>test (scheme 1) and (panel a) under scheme 4 at a 10% false discovery rate (FDR), as calculated using the Benjamini and Hochberg algorithm.</p>
               </text>
               <graphic file="gb-2007-8-5-r69-9"/>
            </fig>
            <p>Although we have argued that the scheme 4 route to FDR should be conservative, given the tight correlation shown in Figure <figr fid="F9">9</figr>, it seems possible that we have over-estimated the number of genes that are truly differentially expressed. Is an assertion that there are in fact no differentially expressed genes in these experiments correct? We can take advantage of our experimental design to rule out that possibility. Of the 90 genes that are significant by scheme 4 based FDR between treatment and control (Table <tblr tid="T1">1</tblr>) for EB samples at 4 hours, 15 are also differentially expressed in the 87 found significant by scheme 4 in the EB samples at 24 hours. We can use the hypergeometric distribution to reject the null hypothesis that the genes found to be significant in EB samples at 4 hours are unrelated to the genes found to be significant in EB samples at 24 hours with <it>P </it>&lt; 10<sup>-25</sup>. A significant fraction of the genes found to be significant with scheme 4 are therefore reproducibly differentially expressed across the 4-hour and 24-hour time points.</p>
            <p>Likewise, of the 38 genes found to be significant by scheme 4 for MY samples at 24 hours, 15 are also differentially expressed in the 268 genes found to be significant by scheme 4 in the MY samples at 4 hours (<it>P </it>&lt; 10<sup>-24</sup>). We have some evidence, therefore, that the scheme 4 route was not inappropriately anticonservative in this analysis. That is, at least some of the genes described by scheme 4 were indeed differentially expressed.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this paper we argue that microarray statistics work best when the estimate of standard error from each gene on the array is ignored or suppressed. We are not the first group to suggest that estimates of variance from individual genes are unreliable. Previous studies have noted improved statistics when a constant is added to the variance <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B11">11</abbr></abbrgrp> or weighted by the variance from neighboring genes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. We argue that if the variance from each gene is truly unknown, then it makes sense to consider all of the genes on the array as arising from a single, normal distribution. We have demonstrated that this assumption of a single normal distribution of all genes comes much closer to producing a uniform distribution of <it>P </it>values than does production of <it>P </it>values from the <it>t </it>distribution (Figures <figr fid="F4">4</figr> and <figr fid="F7">7</figr>).</p>
         <p>It is not immediately clear why algorithms, such as the standard <it>t </it>test, that attempt to estimate the standard error of each individual gene perform so poorly. Difficulties in accurately estimating the variance of each individual gene may arise because of the modest sample sizes in typical microarray experiments. It has also been shown that the normalization process may distort the variance of genes <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. We have seen, however, that the standard <it>t </it>test does much better on quantile-quantile normalized data than on non-normalized data (Figure <figr fid="F2">2</figr>), even though only about 10% of the original estimate of standard error survives normalization (Figure <figr fid="F3">3</figr>). The distortion of variance by normalization is apparently helpful. We argue that it is helpful because it replaces the original estimate of standard error for each gene with a distribution that approaches a small constant (Figure <figr fid="F1">1</figr>). This is consistent with the true variance for each gene being unknown regardless of normalization procedures.</p>
         <p>The cyber <it>t </it>test uses Bayesian statistics to weigh the variance of each gene by the variance of genes with similar intensities on the array (see Materials and methods, below). As the experimental sample size increases, the weight given to the measured variance is increased while the weight given to the variance shared among similar genes is decreased. At large sample sizes, the performance of the cyber <it>t </it>test will therefore approach the performance of the standard <it>t </it>test. This behavior of the cyber <it>t </it>test is appropriate if the measured variance approaches the true variance as sample size increases. If, however, there are other factors at work in addition to small sample size that cause the measured variance to be unreliable, then the performance of the cyber <it>t </it>test may degrade as the sample size increases and the weight assigned to the background variance is therefore diminished. There is an urgent need for control datasets with larger sample sizes to determine whether the unreliability of the measured variance is primarily a function of small sample size or is somehow being caused by other aspects of microarray technology.</p>
         <p>A recent controversy in the microarray literature has centered directly on the assumption of the uniform distribution of null <it>P </it>values. In analyzing a spike in dataset, Choe and coworkers <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> found that predicted FDRs from the SAM <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> algorithm appeared to be greatly anticonservative when compared with actual FDRs. In response, Dabney and Storey <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> noted that the anticonservative behavior of the SAM algorithm could be explained by the non-uniform distribution of <it>P </it>values among the non-spiked-in genes. In the Choe dataset, non-spiked-in genes had a surprising tendency to have <it>P </it>values too close to zero. Dabney and Storey argued that this non-uniform distribution was caused by errors in the experimental design of the spike-in dataset, a charge that was echoed somewhat by a second reanalysis of the Choe dataset <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. These charges have been vigorously disputed by the authors of the Choe dataset, who argue that the non-uniform distribution of <it>P </it>values may be a common feature of microarray data <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <p>Our study lends support to the arguments presented by Choe and coworkers. There are only 42 genes spiked in to the Latin Square dataset, but even this modest number of genes can produce detectable distortions in the distribution of <it>P </it>values among null genes (Figure <figr fid="F6">6</figr>). Given that the Choe dataset includes more than a thousand spiked-in genes, it is not surprising that the null genes in the Choe dataset have profoundly distorted <it>P </it>values. Moreover, the original analysis of the Choe dataset used cyber <it>t </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, whereas the reanalysis used a standard <it>t </it>test <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. We have shown that both of these tests can distort the distribution of null <it>P </it>values (Figure <figr fid="F4">4</figr> and <figr fid="F7">7</figr>). In their reports <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr></abbrgrp>, Choe and coworkers suggest multiple normalization steps as a way to avoid bias in the test statistics. We find that a second normalization step does make a small difference in producing uniform <it>P </it>values (Figures <figr fid="F4">4</figr> and <figr fid="F7">7</figr>). We argue, however, that a larger difference can be made by finding a more appropriate distribution of microarray scores than the <it>t </it>distribution.</p>
         <p>A problem with all microarray statistics papers is that they are dependent on the datasets analyzed. It is a constant worry that the assumptions made with regard to one dataset will not apply to new datasets in the future, that is to say that one has, in effect, constructed a statistic that is 'over-trained' to the datasets considered. The main assumption that we have made in this paper is that it reasonable to treat the standard error from each gene as a constant. This assumption appears to be reasonable for the Latin Square and technical replicate data we have examined (Figures <figr fid="F4">4</figr> and <figr fid="F7">7</figr>). It is not, however, a perfect assumption. The distribution of <it>P </it>values observed in Figures <figr fid="F4">4</figr> and <figr fid="F7">7</figr> are not perfectly uniform. This assumption is clearly more reasonable, however, than the assumptions used to generate the <it>P </it>values for the standard <it>t </it>and cyber <it>t </it>tests, because <it>P </it>values produced by these tests are far from uniform (Figures <figr fid="F4">4</figr> and <figr fid="F7">7</figr>). Genes in datasets that contain biologic replicates will, of course, exhibit a greater degree of variance than genes in the technical replicates that, by necessity, make up control datasets. Despite this, our assumptions appear to produce more reasonable results when applied to a 'real' biologic dataset than the assumptions of the cyber <it>t</it>, standard <it>t</it>, or SAM procedures (Table <tblr tid="T1">1</tblr>).</p>
         <p>We have seen that even within the Latin Square dataset, cross-hybridization can affect probe sets that are annotated as null, distorting <it>P </it>values and complicating FDRs (Figure <figr fid="F6">6</figr>). Microarray experiments are prone to other artifacts, which are incompletely understood. These include saturation of probes at high signal <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, nonequilibrium hybridization conditions <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, and artifacts that arise from the dyes used in microarray experiments <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. A recent study found that different laboratories performing the same microarray experiment on the same RNA sample obtained large differences in their results, although the results from the best performing laboratories exhibited a greater degree of correlation <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Given such a challenging environment, calculation of accurate FDRs remains a difficult proposition. We argue that because FDR calculations are liable to be distorted by subtle artifacts, one should err on the conservative side. We have taken a simple approach and shown that it is possible to generate a reasonable set of <it>P </it>values in a way that should become more conservative as differences increase between sets of chips. In the many cases where a conservative statistic is appropriate, we believe this approach may yield more reasonable gene lists than other currently employed methods.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Implementation of statistics</p>
            </st>
            <p>The uniform distribution of <it>P </it>values in Figures <figr fid="F4">4</figr>, <figr fid="F6">6</figr>, and <figr fid="F7">7</figr> was calculated as simply the inverse of the gene index. So, for example, if there were 22,000 genes in a list ordered by statistic score, then the expected <it>P </it>value for the first gene under a uniform distribution was 1/22,000. The expected <it>P </it>value for the second gene was 2/22,000 and so forth.</p>
            <p>Background correction, quantile-quantile normalization, and RMA summary values were calculated with RMA express <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. In cases in which data were not normalized, background subtraction was also not performed, but RMA summary values were still generated on non-normalized data with RMA express. All RMA values are reported on a log<sub>2 </sub>scale.</p>
            <p>The HG-U133A Latin Square dataset was downloaded from Affymetrix (Santa Clara, CA, USA) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. For the Latin Square data sets, probe sets 209374_s_at, 205397_x_at, and 208010_s_at were excluded for all analyses, as instructed by the HG-U133A_tag_Latin_Square.xls spreadsheet. We also excluded any probe set not in the spike-in probe sets that started with AFFX-. This left 42 true positives and 22,182 true negatives.</p>
            <p>For the cyber <it>t </it>algorithm we used implementations available in the R Bioconductor package with the default parameters. The cyber <it>t </it>code was downloaded from the cyber <it>t </it>web page <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The cyber <it>t </it>test compares arrays for genes in two conditions producing a <it>P </it>value for each gene for the null hypothesis that the mean intensity in each condition is the same. For each gene in each of the two conditions, the cyber <it>t </it>test with the default parameters calculates a weighted standard deviation as follows:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2007-8-5-r69-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>S</m:mi>
                           <m:msub>
                              <m:mi>D</m:mi>
                              <m:mrow>
                                 <m:mi>c</m:mi>
                                 <m:mi>y</m:mi>
                                 <m:mi>b</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>r</m:mi>
                                 <m:mi>T</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msqrt>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mn>0</m:mn>
                                       <m:mo>*</m:mo>
                                       <m:mi>S</m:mi>
                                       <m:msubsup>
                                          <m:mi>D</m:mi>
                                          <m:mrow>
                                             <m:mi>W</m:mi>
                                             <m:mi>i</m:mi>
                                             <m:mi>n</m:mi>
                                             <m:mi>d</m:mi>
                                             <m:mi>o</m:mi>
                                             <m:mi>w</m:mi>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                       <m:mo>+</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>*</m:mo>
                                       <m:mi>S</m:mi>
                                       <m:msup>
                                          <m:mi>D</m:mi>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mn>0</m:mn>
                                       <m:mo>+</m:mo>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                           </m:msqrt>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaaieaacaWFtbGaa8hramaaBaaaleaacaWFJbGaa8xEaiaa=jgacaWFLbGaa8NCaiaa=rfaaeqaaOGaeyypa0ZaaOaaaeaadaWcaaqaaiaa=fdacaWFWaGaaiOkaiaa=nfacaWFebWaa0baaSqaaiaa=DfacaWFPbGaa8NBaiaa=rgacaWFVbGaa83Daaqaaiaa=jdaaaGccqGHRaWkcaWFOaGaa8NBaiabgkHiTiaa=fdacaGGPaGaaiOkaiaa=nfacaWFebWaaWbaaSqabeaacaaIYaaaaaGcbaGaa8xmaiaa=bdacqGHRaWkcaWFUbGaeyOeI0Iaa8NmaaaaaSqabaaaaa@528B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Where n is the sample size (the number of arrays in the condition), SD is the standard deviation as it is usually calculated, and SD<sub>Window </sub>is the average of the standard deviation of the 100 genes with the average intensity closest to the average intensity of the gene under consideration. The cyber <it>t </it>score is then calculated in the same way as the standard <it>t </it>test, with the SD<sub>cyberT </sub>value for each condition replacing the conventional standard deviation for each condition and an adjusted degrees of freedom of 20 + n<sub>1 </sub>+ n<sub>2 </sub>- 4 (where n<sub>1 </sub>is the number of array is condition 1 and n<sub>2 </sub>is the number of arrays in condition 2). For more details, see the cyber t report <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and web page <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <p>The Benjamini and Hochberg algorithm <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> was implemented in Java. The predicted FDR rate for a given gene in a gene list ordered by statistic <it>P </it>value is given by N &#215; p(k)/k, where N is the number of genes in the list and p(k) is the <it>P </it>value produced by the test statistic under the null hypothesis of no differential expression for gene k in the list.</p>
            <p>For SAM, we used the implementation in the Multiple Experiment Viewer <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp> provided by TIGR <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>The cdf (cumulative distribution function) function in Equation 4 was evaluated using the pnorm function in the class StatFunction.java implemented by Sundar Dorai-Raj and downloaded from the Dorai-Raj web page <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. This function yields equivalent values to the R function pnorm with lower.tail = FALSE.</p>
            <p>The Kolmogorov-Smirnov test was ported to Java from the Numerical Recipes in C++ text <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. The Java port was tested against the ks.test method in R. In cases in which the Kolmogorov-Smirnov test returned a zero, the -log<sub>10 </sub>value was set to 200 (Figure <figr fid="F4">4</figr>) or 350 (Figures <figr fid="F6">6</figr> and <figr fid="F7">7</figr>).</p>
            <p>Loess regression lines were generated by the Java class Lowess. java in the package org.tigr.midas.engine distributed as part of the TIGR midas engine <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>All statistics except cyber <it>t </it>and the results of RMA express were implemented in Java. Implementations of the equations presented in this report can be found in the supplementary materials (Additional data file 11) and at the author's web page <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Etoposide treatment</p>
            </st>
            <p>Mouse embryonic stem cells were differentiated into isogenic bursting embryoid body (EB) cells or isogenic myeloid (MY) hematopoietic cells. A population of about 10<sup>6 </sup>EB or MY hematopoietic cells were seeded onto five replicate dishes and expanded to obtain the appropriate number of cells per plate for treatment. About 1.5 &#215; 10<sup>7 </sup>EB or MY hematopoietic cells were exposed to either DMSO plus etoposide at a final concentration of 50 &#956;mol/l or to DMSO (control) for 60 min. Each etoposide or DMSO control was performed on the five replicate dishes. The etoposide stock solution or DMSO (control) was diluted in Iscove's modified Dulbecco's medium supplemented with 10% non-ES-qualified fetal bovine serum. Following etoposide or DMSO (control) exposure, all samples were washed twice in 1&#215; phosphate-buffered saline and plated in fresh medium for a recovery period of 4 or 24 hours. Following recovery all cells were washed twice in 1&#215; phosphate-buffered saline and harvested for RNA isolation.</p>
         </sec>
         <sec>
            <st>
               <p>RNA isolation and processing for microarrays</p>
            </st>
            <p>Control and etoposide-treated cells were pelleted by centrifugation and lysed in TRIzol Reagent (Invitrogen, Carlsbad, CA, USA; 1 ml per 10 &#215; 10<sup>6 </sup>cells) by repetitive pipetting followed by incubation at room temperature for 5 min. Total RNA was recovered by phenol-chloroform extraction and isopropyl alcohol precipitation. Extracted RNA was further purified using the RNeasy mini kit (Qiagen, Valencia, CA, USA). Biotin-labeled cDNA was prepared from the purified RNA samples using the Ovation&#8482; Biotin RNA Amplification and Labeling System (NuGEN Technologies, Inc., San Carlos, CA, USA), in accordance with the manufacturer's protocol. Briefly, first-strand and second-strand cDNA synthesis was followed by amplification of the double-stranded DNA template. Amplified cDNA was then fragmented and labeled with biotin. Biotin-labeled cDNA was purified using the DyeEx 2.0 Spin Kit (Qiagen), and product yield and purity were determined u A260, A280, and A320 spectrophotometric measurements. Fragmented, biotin-labeled cDNA (2.2 &#956;g) from each sample was hybridized to a GeneChip Mouse Genome 430 2.0 array (Affymetrix, Inc.). The Mouse Genome 430 2.0 array contains 45,000 probe sets used to analyze the expression level of over 39,000 transcripts from over 34,000 mouse genes. Hybridization, washing, staining, and scanning of microarrays was performed by the Gene Chip Analysis Facility in the Institute for Cancer Genetics, Columbia University Health Sciences Division, New York, USA.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this manuscript.Raw data (in the form of 40 .cel files) for the Etoposide experiments described in Table <tblr tid="T1">1</tblr> can be found in the supplementary materials in this paper (Additional files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr>, <supplr sid="S7">7</supplr>, <supplr sid="S8">8</supplr>, <supplr sid="S9">9</supplr>, <supplr sid="S10">10</supplr> compressed with bzip2). Filenames within the zip files that start with MY indicate myeloid hematopoietic cells while filenames starting with EB indicate bursting embryoid bodies. The third character of each file indicates treatment with drug + DMSO ("E") or just control DMSO ("C"). The next character of each filename indicates the replicate number. The final characters indicate the time point. So, for example, "MYE34.CEL" indicates a myeloid hematopoietic cells ("MY") treated with drug ("E"), replicate number 3, 4 hour time point. "EBC124.CEL" indicates bursting embryoid bodies ("EB"), treated with only DMSO ("C"), replicate number 1, 24 hour time point. Additional data file <supplr sid="S11">11</supplr> provides implementations of the equations presented in this report in R.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>CEL files for the EB samples</p>
            </caption>
            <text>
               <p>This file contains EBC14.CEL, EBC24.CEL, EBC34.CEL, and EBC44.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S1.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>CEL files for the EB samples</p>
            </caption>
            <text>
               <p>This file contains EBC54.CEL, EBE14.CEL, EBE24.CEL, and EBE34.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S2.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>CEL files for the EB samples</p>
            </caption>
            <text>
               <p>This file contains EBE44.CEL, EBE54.CEL, EBC124.CEL, and EBC224.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S3.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>CEL files for the EB samples</p>
            </caption>
            <text>
               <p>This file contains EBC324.CEL, EBC424.CEL, EBC524.CEL, and EBE124.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S4.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>CEL files for the EB samples</p>
            </caption>
            <text>
               <p>This file contains EBE224.CEL, EBE324.CEL, EBE424.CEL, and EBE524.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S5.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>CEL files for the MY samples</p>
            </caption>
            <text>
               <p>This file contains MYC14.CEL, MYC24.CEL, MYC34.CEL, and MYC44.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S6.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>CEL files for the MY samples</p>
            </caption>
            <text>
               <p>This file contains MYC54.CEL, MYE14.CEL, MYE24.CEL, and MYE34.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S7.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>CEL files for the MY samples</p>
            </caption>
            <text>
               <p>This file contains MYE44.CEL, MYE54.CEL, MYC124.CEL, and MYC224.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S8.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S9">
            <title>
               <p>Additional data file 9</p>
            </title>
            <caption>
               <p>CEL files for the MY samples</p>
            </caption>
            <text>
               <p>This file contains MYC324.CEL, MYC424.CEL, MYC524.CEL, and MYE124.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S9.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S10">
            <title>
               <p>Additional data file 10</p>
            </title>
            <caption>
               <p>CEL files for the MY samples</p>
            </caption>
            <text>
               <p>This file contains MYE224.CEL, MYE324.CEL, MYE424.CEL, and MYE524.CEL.</p>
            </text>
            <file name="gb-2007-8-5-r69-S10.bz2">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S11">
            <title>
               <p>Additional data file 11</p>
            </title>
            <caption>
               <p>Implementations of equations in R</p>
            </caption>
            <text>
               <p>Provided are implementations of the equations presented in this report in R.</p>
            </text>
            <file name="gb-2007-8-5-r69-S11.zip">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank two anonymous reviewers for insightful comments. We thank Jonathan McCafferty and Richard Francis for technical assistance, and Ed Boyden and Alex Gordon for critical comments on an earlier version of this manuscript. Rafael A Irizarry generously provided the .CEL files for the technical replicates shown in Figure <figr fid="F7">7</figr>.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Significance analysis of microarrays applied to the ionizing radiation response.</p>
            </title>
            <aug>
               <au>
                  <snm>Tusher</snm>
                  <fnm>VG</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>5116</fpage>
            <lpage>5121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33173</pubid>
                  <pubid idtype="pmpid" link="fulltext">11309499</pubid>
                  <pubid idtype="doi">10.1073/pnas.091062498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Statistical significance for genomewide studies.</p>
            </title>
            <aug>
               <au>
                  <snm>Storey</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>9440</fpage>
            <lpage>9445</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">170937</pubid>
                  <pubid idtype="pmpid" link="fulltext">12883005</pubid>
                  <pubid idtype="doi">10.1073/pnas.1530509100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The control of the false discovery rate in multiple testing under dependency.</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Ann Stat</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>1165</fpage>
            <lpage>1188</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/aos/1013699998</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Identifying differentially expressed genes using false discovery rate controlling procedures.</p>
            </title>
            <aug>
               <au>
                  <snm>Reiner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>368</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btf877</pubid>
                  <pubid idtype="pmpid" link="fulltext">12584122</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A reanalysis of a published Affymetrix GeneChip control dataset.</p>
            </title>
            <aug>
               <au>
                  <snm>Dabney</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Storey</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>401</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1557755</pubid>
                  <pubid idtype="pmpid" link="fulltext">16563185</pubid>
                  <pubid idtype="doi">10.1186/gb-2006-7-3-401</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Variance stabilization applied to microarray data calibration and to the quantification of differential expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Hu