<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2002-3-12-research0073</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A strategy for oligonucleotide microarray probe reduction</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Antipova</snm>
               <mi>A</mi>
               <fnm>Alena</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
            </au>
            <au id="A2">
               <snm>Tamayo</snm>
               <fnm>Pablo</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3" ca="yes">
               <snm>Golub</snm>
               <mi>R</mi>
               <fnm>Todd</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>golub@genome.wi.mit.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Genome Research, Whitehead Institute/Massachusetts Institute of Technology, Cambridge, MA 02139, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2002</pubdate>
         <volume>3</volume>
         <issue>12</issue>
         <fpage>research0073.1</fpage>
         <lpage>research0073.4</lpage>
         <url>http://genomebiology.com/2002/3/12/research/0073</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2002-3-12-research0073</pubid>
               <pubid idtype="pmpid">12537562</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>16</day>
               <month>8</month>
               <year>2002</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>23</day>
               <month>9</month>
               <year>2002</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>11</day>
               <month>10</month>
               <year>2002</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>11</month>
               <year>2002</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2002</year>
         <collab>Antipova et al., licensee BioMed Central Ltd</collab>
      </cpyrt>
      <shorttitle>
         <p>A strategy for oligonucleotide microarray probe reduction</p>
      </shorttitle>
      <shortabs>
         <p>One of the factors limiting the number of genes that can be analyzed on high-density oligonucleotide arrays is that each transcript is probed by multiple oligonucleotide probes. To reduce the number of probes required for each gene, a systematic approach to choosing the most representative probes is needed. A generalizable empiric method is presented for reducing the number of probes per gene while maximizing the fidelity to the original array design.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>One of the factors limiting the number of genes that can be analyzed on high-density oligonucleotide arrays is that each transcript is probed by multiple oligonucleotide probes. To reduce the number of probes required for each gene, a systematic approach to choosing the most representative probes is needed. A method is presented for reducing the number of probes per gene while maximizing the fidelity to the original array design.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The methodology has been tested on a dataset comprising 317 Affymetrix HuGeneFL GeneChips. The performance of the original and reduced probe sets was compared in four cancer-classification problems. The results of these comparisons show that reduction of the probe set by 95% does not dramatically affect performance, and thus illustrate the feasibility of substantially reducing probe numbers without significantly compromising sensitivity and specificity of detection.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>The strategy described here is potentially useful for designing small, limited-probe genome-wide arrays for screening applications.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010003">Cancer</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>DNA microarrays have become commonplace for the genome-wide measurement of mRNA expression levels. The first described microarray for this purpose, the cDNA microarray, involves the mechanical deposition of cDNA clones on glass slides [<abbr bid="B1">1</abbr>]. Although this strategy has proved highly effective, it has two limitations: cross-hybridization can occur between mRNAs and non-unique or repetitive portions of the cDNA clone; and the maintenance and quality control of large, arrayed cDNA libraries can be challenging. For these reasons, oligonucleotide microarrays have at least theoretical advantages. Short probes (25 nucleotides or longer) can be selected on the basis of their sequence specificity, and either synthesized <it>in situ</it> (by photolithography or inkjet technology) on a solid surface or conventionally synthesized and then robotically deposited.</p>
         <p>The first oligonucleotide microarrays contained hundreds of distinct probes per gene in order to maximize sensitivity and specificity of detection [<abbr bid="B2">2</abbr>]. Over the past few years, the number of probes per gene has decreased as increasing amounts of sequence information have become available, probe-selection algorithms have improved, feature sizes have decreased and researchers have wanted to maximize the number of genes assayable on a single microarray. Nevertheless, no single array representing the entire human genome has been described. Furthermore, to date, no systematic high-throughput method has been published that can be used for reducing the number of probes per gene while maximizing the sensitivity and specificity of these reduced probe sets.</p>
         <p>Several strategies for probe reduction could be considered. Probes could be selected at random, but given that different probes can have dramatically different hybridization properties, this random method would be likely to result in failure, at least for some genes. Alternatively, one could assess the fidelity of candidate probes by comparison to a gold standard of gene-expression measurement such as real-time quantitative PCR or Northern blotting. Such approaches, however, are not feasible at a genome-wide scale. We report here a generalizable, empiric strategy for probe reduction that eliminates 95% of probes, yet maximizes fidelity to the original microarray design.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The experiments described here are based on HuGeneFL GeneChips commercially available from Affymetrix. These arrays contain approximately 282,000 25-mer oligonucleotide probes corresponding to 6,817 human genes and expressed sequence tags (ESTs) (a total of 7,129 probe sets). On average, each gene is represented by 40 probes: 20 'perfect match' probes that are complementary to the mRNA sequence of interest, and 20 'mismatch' probes that differ only by a single nucleotide at the central (13th) base. We refer to the perfect match/mismatch pair as a 'probe pair'. Each gene is thus represented by 20 probe pairs. Normally, these 20 probe pairs are consolidated into a single expression level (known as 'Average Difference') for each gene using GeneChip software (Affymetrix) which calculates a trimmed mean of the perfect match minus mismatch differences in order to incorporate some measure of non-specific cross-hybridization to mismatch probes [<abbr bid="B2">2</abbr>]. Alternative methods for estimating message abundance have also been reported [<abbr bid="B3">3</abbr>,<abbr bid="B4">4</abbr>].</p>
         <p>To reduce the number of probes per gene, we sought to identify the single probe pair for each gene that best approximated the Average Difference, a value that is based on all 20 probe pairs. To accomplish this, we first defined a training set of expression data derived from 141 human tumor samples of diverse cellular origins [<abbr bid="B5">5</abbr>]. For each gene on the array, we generated a vector corresponding to the normalized Average Difference value across the 141 samples. Next, we calculated the perfect match minus mismatch value for each of the 20 individual probe pairs for each gene on the array (referred to hereafter as delta (&#916;). In the final step, the 20 normalized &#916;s for each gene were ranked according to their degree of correlation with the Average Difference vector across the 141 training samples using Euclidean distance as the metric. The highest-ranking &#916; (&#916;<sub>h</sub>) was chosen for further evaluation in an independent test set. A schematic for this procedure is shown in Figure <figr fid="F1">1</figr>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Schema for selection and evaluation of the single probe pair (&#916;<sub>h</sub>) that best approximates the Average Difference value derived from all 20 probe pairs</p>
            </caption>
            <text>
               <p>Schema for selection and evaluation of the single probe pair (&#916;<sub>h</sub>) that best approximates the Average Difference value derived from all 20 probe pairs. PM, perfect match; MM, mismatch. &#916; = PM - MM.</p>
            </text>
            <graphic file="gb-2002-3-12-research0073-1"/>
         </fig>
         <p>The independent test set consisted of expression data derived from 176 tumor samples that were entirely non-overlapping with the training set. We determined the ability of the training-set-derived &#916;<sub>h</sub> values to approximate the Average Difference values in the independent test set, compared to randomly selected &#916;s. As shown in Figure <figr fid="F2">2</figr>, 79.3% (&#177; 3.0%) of &#916;<sub>h</sub> values were within twofold of their respective Average Difference value, as compared to 57.8% (&#177; 5.1%) for randomly selected &#916;s. The relative error of the estimates was 0.8 (&#177; 0.1) for &#916;<sub>h</sub> values and 2.7 (&#177; 0.7) for randomly selected &#916;s. Overall, the distribution of &#916;<sub>h</sub> accuracies was distinct from randomly selected &#916;s (p &lt; 10<sup>-4</sup>, chi-squared test). This result indicates that the empirical selection of &#916;<sub>h</sub> is a better strategy for reducing probe numbers compared to random probe selection.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Comparison of Average Difference values with &#916;<sub>h</sub> and randomly selected &#916;s</p>
            </caption>
            <text>
               <p>Comparison of Average Difference values with &#916;<sub>h</sub> and randomly selected &#916;s. For each of the datasets shown, the proportion of genes whose &#916;<sub>h</sub> value is within twofold of the Average Difference is shown by the black bars. The same comparison is shown for random s (gray bars). Error bars indicate standard deviation. Standard deviation shown reflects variations in the percentage of genes within twofold of the Average Difference between the 176 chips of the training set. Note that the &#916;<sub>h</sub>s better approximate the Average Difference compared to randomly selected &#916; s.</p>
            </text>
            <graphic file="gb-2002-3-12-research0073-2"/>
         </fig>
         <p>We next determined whether training-set-derived &#916;<sub>h</sub> values would be sufficient for pattern recognition and classification of the independent test set of samples. The 176 test samples fall into four binary classification problems: acute myeloid leukemia, AML, versus acute lymphoblastic leukemia, ALL (leukemia set A; <it>n</it> = 35); T-cell ALL versus B-cell ALL (leukemia set B; <it>n</it> = 23); diffuse large B-cell lymphoma survival prediction (<it>n</it> = 58); and medulloblastoma brain tumor survival prediction (<it>n</it> = 60), as described previously [<abbr bid="B6">6</abbr>,<abbr bid="B7">7</abbr>,<abbr bid="B8">8</abbr>]. We used a <it>k</it>-nearest neighbors (<it>k</it>-NN) prediction algorithm [<abbr bid="B9">9</abbr>] and applied it to these four classification problems using either the Average Difference values or the &#916;<sub>h</sub>values as the starting point. As shown in Table <tblr tid="T1">1</tblr>, classification accuracy based on &#916;<sub>h</sub> was nearly identical to that obtained using Average Difference values, despite the fact that 95% fewer probes were utilized. It should be noted that while &#916;<sub>h</sub> values more accurately approximated the Average Difference compared to random &#916;s (Figure <figr fid="F2">2</figr>), the random &#916;s also performed relatively well in these classification problems. It is possible, however, that classification performance would deteriorate when applied to more subtle classification problems, or when applied to samples of different tissue types. These results, taken together, demonstrate the feasibility of substantially reducing probe numbers without dramatically affecting performance.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Classification accuracy using Average Difference, randomly selected &#916;s, and &#916;<sub>h</sub> values</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="3" ca="center">
                     <p>Error rate</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dataset</p>
                  </c>
                  <c ca="center">
                     <p>
                        <it>n</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Classification problem</p>
                  </c>
                  <c ca="center">
                     <p>&#916;<sub>h</sub> (%)</p>
                  </c>
                  <c ca="center">
                     <p>Random &#916;s (%)</p>
                  </c>
                  <c ca="center">
                     <p>Average Difference (%)</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Leukemia (set A)</p>
                  </c>
                  <c ca="center">
                     <p>35</p>
                  </c>
                  <c ca="left">
                     <p>ALL vs AML</p>
                  </c>
                  <c ca="center">
                     <p>3</p>
                  </c>
                  <c ca="center">
                     <p>2 &#177; 1</p>
                  </c>
                  <c ca="center">
                     <p>3</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Leukemia (set B)</p>
                  </c>
                  <c ca="center">
                     <p>23</p>
                  </c>
                  <c ca="left">
                     <p>T-ALL vs B-ALL</p>
                  </c>
                  <c ca="center">
                     <p>0</p>
                  </c>
                  <c ca="center">
                     <p>0 &#177; 1</p>
                  </c>
                  <c ca="center">
                     <p>0</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Lymphoma</p>
                  </c>
                  <c ca="center">
                     <p>58</p>
                  </c>
                  <c ca="left">
                     <p>Cured vs fatal</p>
                  </c>
                  <c ca="center">
                     <p>26</p>
                  </c>
                  <c ca="center">
                     <p>29 &#177; 5</p>
                  </c>
                  <c ca="center">
                     <p>24</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Medulloblastoma</p>
                  </c>
                  <c ca="center">
                     <p>60</p>
                  </c>
                  <c ca="left">
                     <p>Cured vs fatal</p>
                  </c>
                  <c ca="center">
                     <p>18</p>
                  </c>
                  <c ca="center">
                     <p>26 &#177; 4</p>
                  </c>
                  <c ca="center">
                     <p>24</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>In conclusion, the empirical approach to probe reduction presented here allows a systematic optimization of individual probe sets. Our studies specifically reinforce the notion that careful selection of probe pairs based on their hybridization behavior is a promising strategy for future chip design. Nevertheless, it remains likely that the use of multiple probes per gene will generate the most accurate and robust detectors. For diagnostic applications in particular, probe redundancy may significantly improve performance. For screening applications, however, the availability of small, limited-probe, genome-wide arrays could be useful.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>The raw data analyzed here has been previously reported [<abbr bid="B6">6</abbr>,<abbr bid="B7">7</abbr>,<abbr bid="B8">8</abbr>] and is available at [<abbr bid="B5">5</abbr>].</p>
         </sec>
         <sec>
            <st>
               <p>Approximation of Average Difference</p>
            </st>
            <p>To estimate the percentage of genes with &#916;<sub>h</sub> values within 2-fold of the Average Difference, for each gene we compared the value of &#916;<sub>h</sub> with the Average Difference for this probe set. The percentage of genes within 2-fold of the Average Difference was then averaged over the 176 chips of the training set. To evaluate random probe selection, for each gene a &#916; was chosen randomly and the percentage of genes within twofold of the Average Difference was similarly calculated. This process was repeated 20 times and then averaged. Values of both Average Difference and selected &#916;s were normalized and a threshold set at 100 units. Relative error for the estimates for &#916;<sub>h</sub> and randomly selected &#916; values was calculated as |&#916; - Average Difference |/Average Difference.</p>
         </sec>
         <sec>
            <st>
               <p>Rescaling</p>
            </st>
            <p>To account for minor variation in overall chip intensities, Average Difference values were scaled as previously described [<abbr bid="B8">8</abbr>]. For &#916;<sub>h</sub> values, scaling was adjusted by a slope and intercept obtained from a least-squares linear fit of the &#916;<sub>h</sub> values for each chip compared to a randomly selected reference chip.</p>
         </sec>
         <sec>
            <st>
               <p>Classification</p>
            </st>
            <p>Average Difference and &#916; values were clipped to minimum 20 and maximum 16,000 units. A variation filter was applied that excluded genes that did not vary at least threefold and 100 units across the entire dataset. To compare the classification accuracy for &#916;s and Average Difference, we applied a <it>k</it>-nearest neighbors (<it>k</it>-NN) [<abbr bid="B9">9</abbr>] binary classifier, implemented in the software package GeneCluster 2.0 and available at [<abbr bid="B10">10</abbr>], to each of the four classification problems as previously described [<abbr bid="B8">8</abbr>]. Average Difference or &#916; feature selection was performed with the signal-to-noise metric [<abbr bid="B6">6</abbr>] (&#956;<sub>class 0</sub> - &#956;<sub>class 1</sub>) /(&#963;<sub>class 0</sub> + &#963;<sub>class 1</sub>), where &#956; and &#963; represent the mean and standard deviation within each class, respectively, and the top-ranking features were fed into the <it>k</it>-NN algorithm. Performance was evaluated by leave-one-out cross-validation, whereby for each sample a prediction was made with a model trained on the remaining samples in the problem set, and the number of classification errors was tallied. Classifiers with variable numbers of features (1-100) and nearest neighbors (k = 3 or k = 5) were tested. The best-performing classification results are reported.</p>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Michael Angelo and Michael Reich for programming help, Sridhar Ramaswamy for providing datasets, and Eric Lander for helpful discussions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Quantitative monitoring of gene expression patterns with a complementary DNA microarray.</p>
            </title>
            <aug>
               <au>
                  <snm>Schena</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shalon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>467</fpage>
            <lpage>470</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7569999</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Expression monitoring by hybridization to high-density oligonucleotide arrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Lockhart</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Dong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Byrne</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Follettie</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Gallo</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Chee</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Mittmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Horton</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>1996</pubdate>
            <volume>14</volume>
            <fpage>1675</fpage>
            <lpage>1680</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9634850</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data.</p>
            </title>
            <aug>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ellis</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>J Cell Biochem Suppl</source>
            <pubdate>2001</pubdate>
            <issue>Suppl(37)</issue>
            <fpage>120</fpage>
            <lpage>125</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jcb.10073</pubid>
                  <pubid idtype="pmpid">11842437</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Affymetrix, Statistical Algorithms Reference Guide</p>
            </title>
            <pubdate>2001</pubdate>
            <url>http://www.affymetrix.com/support/technical/technotes/statistical_reference_guide.pdf</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Whitehead Institute, Center for Genome Research - Cancer Genomics Publications/Projects</p>
            </title>
            <url>http://www-genome.wi.mit.edu/cancer/pubs/feature_reduction</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.</p>
            </title>
            <aug>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Coller</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Loh</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Downing</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Caligiuri</snm>
                  <fnm>MA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5439.531</pubid>
                  <pubid idtype="pmpid" link="fulltext">10521349</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.</p>
            </title>
            <aug>
               <au>
                  <snm>Shipp</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Ross</snm>
                  <fnm>KN</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Kutok</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Aguiar</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Angelo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Reich</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pinkus</snm>
                  <fnm>GS</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Med</source>
            <pubdate>2002</pubdate>
            <volume>8</volume>
            <fpage>68</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nm0102-68</pubid>
                  <pubid idtype="pmpid" link="fulltext">11786909</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Prediction of central nervous system embryonal tumor outcome based on gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Pomeroy</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sturla</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Angelo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McLaughlin</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Goumnerova</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Black</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>436</fpage>
            <lpage>442</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415436a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11807556</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <aug>
               <au>
                  <snm>Dasarathy</snm>
                  <fnm>BV</fnm>
               </au>
            </aug>
            <source>Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques.</source>
            <publisher>Washington, DC: IEEE Computer Society Press</publisher>
            <pubdate>1991</pubdate>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Whitehead Institute, Center for Genome Research - Cancer Genomics Software</p>
            </title>
            <url>http://www-genome.wi.mit.edu/cancer/software/software.html</url>
         </bibl>
      </refgrp>
   </bm>
</art>
