Affymetrix microarrays have become increasingly popular in gene-expression studies; however, limitations of the technology have not been well established for commercially available arrays. The hybridization signal has been shown to be proportional to actual transcript concentration for specialized arrays containing hundreds of distinct probe pairs per gene. Additionally, the technology has been described as capable of distinguishing concentration levels within a factor of 2, and of detecting transcript frequencies as low as 1 in 2,000,000. Using commercially available arrays, we assessed these representations directly through a series of 'spike-in' hybridizations involving four prokaryotic transcripts in the absence and presence of fixed eukaryotic background. The contribution of probe-target interactions to the mismatch signal was quantified under various analyte concentrations.
A linear relationship between transcript abundance and signal was consistently observed between 1 pM and 10 pM transcripts. The signal ceased to be linear above the 10 pM level and commenced saturating around the 100 pM level. The 0.1 pM transcripts were virtually undetectable in the presence of eukaryotic background. Our measurements show that preponderance of the signal for mismatch probes derives from interactions with the target transcripts.
Landmark studies outlining an observed linear relationship between signal and transcript concentration were carried out under highly specialized conditions and may not extend to commercially available arrays under routine operating conditions. Additionally, alternative metrics that are not based on the difference in the signal of members of a probe pair may further improve the quantitative utility of the Affymetrix GeneChip® array.
Even though the DNA microarray is still an emerging technology, its usefulness as a profiling tool is well established. Affymetrix GeneChip® arrays enable the concurrent assessment of expression levels for thousands of genes in a single experiment. At the molecular level, however, the microarray experiment is a challenging biophysical problem that is extremely dependent on probe-target kinetics, specificity and design. Among the principal sources of variability are the nonspecific interactions due to combinatorial complexity of the genome, the thermodynamic equivalence of probes, the accuracy and spatial uniformity of probe synthesis and the preparation, amplification and fractionation of cDNA and cRNA.
The photolithographically synthesized oligonucleotide microarrays that underlie the Affymetrix GeneChip® array use pairs (typically 16 or 20) of perfect-match (PM) and mismatch (MM) features. Each feature is a rectangular region containing oligonucleotides complementary to a corresponding region of a gene. Because of the inherent difficulties of oligonucleotide synthesis, the proportion of full-length (25 mer) probes within a feature is rather low . The PM probes are distinguishable from the MM probes only by the nucleotide in the 13th position. The expressed design intent behind the MM feature is to quantify the background noise (for example, scanner noise) and nonspecific interactions (for example, cross-hybridization) embedded within the PM signal. To arrive at a pristine measure of the signal attributable to probe-target interaction, it has been suggested [2,3,4] that the intensity for the MM feature should be subtracted from that of the PM feature for each probe pair and subsequently averaged (excluding outliers) to produce the average difference intensity (ADI). This heuristic is currently utilized to characterize the expression level for a given gene.
While the importance of differentiating actual expression from noise cannot be overemphasized, no published experiment establishes the assumption that the MM signal closely resembles the nonspecific component of the PM signal. It has been suggested  that a significant portion of the MM signal may derive from probe-target rather than background interactions; however, the relative contributions of specific and nonspecific components have not been established. In the present study, an assay was specifically designed to isolate the fundamental components of the PM and MM signals. The stratification of nonspecific interactions unique to or common to the PM and MM signals was not addressed.
Our results show that for transcript concentrations above 100 pM, the nonspecific component of PM signal is a negligible part of the MM signal. In fact, the greater part of the MM signal reflects interaction with the corresponding target transcript. This prevents the ADI (PM - MM) from being sensitive to changes in transcript concentration above the 100 pM level (approximately 1 in 1,500 transcript frequency). Comparison of absolute values of probe intensities across the different experimental conditions of the study suggest that thermodynamic equilibrium may be dominated by stable nonspecific interactions essentially decreasing target availability (that is, cross-target bridges) in the presence of complex cRNA. Whereas our results show that quantitative detection of target transcripts is possible with commercially available arrays, they also demonstrate the potential of alternative metrics for further improving the power of the analysis.
Biotinylated target cRNA from four prokaryotic genes - lys, dap, phe and thr from Bacillus subtilis - was collectively hybridized to Test2 arrays with and without 0.05 g/l of complex cRNA background generated from human brain total RNA. This background concentration roughly corresponds to 150 nM (0.05 g/l / (330 g/(mol × nucleotide) × 1,000 nucleotides)). The target transcript frequency can then be simply computed by dividing transcript concentration by total RNA concentration. B. subtilis genes were selected on the basis of their designation as standardization controls. In accordance with the probe array design, genes encoded on the Test2 array contain three probe sets corresponding to the 5', middle and 3' regions. Thus, a total of 12 distinct controlled target-transcript measurements were obtained per hybridization. Precision in target-transcript concentration was increased through a series of tenfold dilutions from a presynthesized in vitro transcribed mixture. Each combination of dilution and background was replicated four times (array availability) to account for and assess variability.
Figures 1 and 2 show that the plot of intensity versus target concentration has the typical sigmoidal shape encountered in chemical kinetics. The error bars reflect the range of observed values, which increases for measurements obtained subsequent to the lag phase and is greatest for the 100 pM level. Table 1 supplements these graphs by distinguishing values at low transcript concentrations. Table 2 depicts averaged, pairwise ADI ratios across consecutive dilution levels (four replicates per dilution; 16 pairs in total) after normalization. Given the sensitivity of the ratio to small absolute values, the variance of the ADI ratios in the 1-0.1 pM range was comparable to that of the mean, and was highly influenced by the outlier-removal algorithm. It is worth noting that normalization did not have a significant effect owing to the extremely uniform conditions of this study.
Figure 1. PM, MM, and ADI signals without cRNA backgroun d. PM, red; MM, blue; ADI, green. (a) PheX_3; (b) ThrX_5; (c) DapX_M; (d) LysX_5.
Figure 2. PM, MM, and ADI signals with cRNA background. PM, red; MM, blue; ADI, green. (a) PheX_3; (b) ThrX_5; (c) DapX_M; (d) LysX_5.
Table 1. ADI, PM, MM and signal-to-noise ratio (data not normalized)
Table 2. ADI fold changes
A cursory review of Table 2 will establish that under both hybridization conditions (target transcripts versus target transcripts + cRNA background), a tenfold reduction in the quantity of target transcript was consistently reflected by the ADI in the 1-10 pM range. Additionally, the window of target-transcript concentration under which proportionality was preserved varied in accordance with background cRNA content. In the cRNA sample comprised solely of target transcripts and hybridization controls (bioB, bioC, bioD, cre and dap), approximate ratio equality was witnessed between 0.1-10 pM. Under conditions more akin to a standard assay (brain cRNA background), the range of proportionality shifted to 1-100 pM. In both cases, the ADI peaks and subsequently plateaus above 100 pM (Figures 1, 2 and Table 1 for PheX_3) as the rate of increase in the MM signal becomes equivalent to or greater than that of the PM signal. In the veritable absence of nonspecific interactions, background noise is bounded by the absolute value of the 0.1 pM MM signal. Analogously, the nonspecific interaction component of the MM signal is bounded by the observed increase in the 0.1 pM signal in the presence of complex cRNA. Consequently, by comparing both sides of Table 1, we conclude that less than 20% of the MM signal at the 100 pM level reflects nonspecific binding. The major component of the MM signal (66-80%) therefore constitutes interaction with the intended PM target transcript. Absolute values for the same transcript concentration differ significantly, despite averaging over 20 probe pairs, which shows the influence of selected probe sequences (Table 1).
Addition of a complex cRNA background elicited a profound effect on observed intensities, especially at the lower range of concentration. The absolute values of PM and MM increased for the 0.1 pM level whereas they decreased for the 10 pM and 100 pM levels (Table 1). Surprisingly, the decrease in absolute values for 100 pM resulted in the improved sensitivity of the 100-10 pM average ADI ratio (factor of 6 versus factor of 3). In addition, the signal-to-noise ratio (when defined as ADI/MM) decreased dramatically in the 0.1-10 pM range whereas the standard error of the 10-1 pM ADI ratios increased (Table 1). For the 0.1 pM target concentration, all 12 target transcripts were detectable in the absence of brain cRNA background. However, the presence of complex cRNA essentially rendered them undetectable. It is evident that the complex cRNA background had a nonlinear effect on the ADI, with up to a fourfold difference in the average ADIs observed for the same quantity of transcript (Figure 3).
Figure 3. Sigmoidal fits to ADIs for PheX_3. Upper curve was obtained without cRNA background.
To study the effect of hybridization duration, the 0.1, 1 and 10 pM hybridizations were repeated in the presence of fixed eukaryotic background with the hybridization time doubled from 16 to 32 hours. A marginal increase in the average ADI was observed (by around 10%) although such increase was within the observed 16-hour range. Additionally, the increase in signal for the 10 pM level remained consistently less than half of that obtained in the absence of complex cRNA. The increased hybridization time did result in partial detection of prokaryotic controls at 0.1 pM (5 out of 12 transcripts).
Finally, to examine reproducibility, a scatter plot of the log of gene ADIs for two distinct hybridizations (10 pM transcript concentration in the presence of fixed eukaryotic background) was produced (Figure 4a). About 10% of the points corresponding to positive ADIs fell outside the region corresponding to a factor of 2 deviations from the mean.
Figure 4. Correlation between hybridization results obtained for the same RNA sample. (a) Log space plot; (b) linear space plot. Uniform 'factor of 2' region (see text) in the log space plot does not have an intuitive physical counterpart in linear space. It underestimates variability at the low end of ADIs.
This study was designed to validate objectively the sensitivity of the Affymetrix GeneChip® array. Whereas excellent earlier studies show a quantitative relationship between the ADI and transcript abundance [2,3], the experimental design of this study differs significantly from those described in earlier publications in order to assess the system's capabilities under conditions closer to real-life experiments. These differences include: increased precision in spiked transcript concentration through dilution from the same IVT product; utilization of Test2 chips reflecting current probe sets and production chemistry; no post-hybridization streptavidin amplification; multiple measurements per condition; comparison of transcript intensities obtained with absent or present complex cRNA background; and the use of commercially available Test2 arrays rather than custom-made arrays containing hundreds of probe pairs per gene.
Of practical consideration is the ability of the system to provide a robust, directional measure of transcript levels. In order for the system to reliably characterize gene expression, correspondence between the quantity of transcript and the numerical measure selected as its proxy (intensity) must be demonstrated. It has been shown previously that the ADI is proportional to cRNA level and predictive of absolute RNA concentration within a factor of 2 [2,3,4]. Typically, the latter is demonstrated by plotting the log of gene ADIs between two identical tissue samples with 98%  of the points falling within a region bounded by the lines y = 2x and y = 0.5x (factor of 2 region). Although never rigorously derived, this region can be viewed as an empirical analogy of the confidence region. It is clear that although this assertion may hold for an average gene, it will be more likely to fail for rare genes with small ADIs (Figure 4a). Comparison of Figure 4a and b shows how the log scale gives a somewhat misleading impression of the uniformity of the 'factor of 2' region across the dynamic range. In addition, it is important to remember that sequence diversity and amount of cRNA will nonlinearly affect probe-target binding kinetics and, by extension, intensities. Even averaged ADIs for an identical quantity of transcript differed dramatically (up to a factor of 4), solely because of the presence of eukaryotic cRNA background. This effect is difficult to account for through linear normalization schemes currently used. It is important to note that the experimental conditions underlying the initial 'factor of 2' claim are quite specific and relate to cRNA derived from the same yeast source.
As evidenced by the range bars in Figures 1 and 2, the variability of actual transcript intensities appears to increase with concentration. It has been suggested that variability can be reduced by selecting pixels adaptively . However, it is not clear how to account algorithmically for large pixel-to-pixel variability from the biophysical or statistical standpoint, given that each pixel is supposed to contain an identical distribution of millions of oligonucleotides. A contributing factor to spatial variability appears to be microscopic defects on the surface of the wafer . Investigation of the nature of pixel-to-pixel variability may potentially improve the predictive utility of obtained data.
The observed linear range of the ADI is narrower than that previously reported , and potentially inflated from what might typically be encountered given the reduction in procedural variability achieved through the comparatively uniform conditions of the experiments presented here. It is worth noting, however, that the linear range initially reported  was obtained using a custom array containing probe sets with more than 500 PM/MM probe pairs per gene. Consequently, it is not surprising that even with technological advances in chip manufacturing over the past four years, we are still unable to reproduce fully the linear range of the ADI using commercial chips with 20 or fewer PM/MM probe pairs per probe set.
The premise that subtraction of the MM from PM essentially serves to extract constructively the nonspecific components (cross-hybridization, noise, and so on) of the signal common to both should be reconsidered in the light of the results obtained in this study. It appears to perform inadequately for low and high transcript concentrations alike, although the basis for the failure differs. At high concentrations (above 100 pM level), the rate of increase of the MM signal can eclipse that of the PM signal (Figures 1d, 2d, Table 1 for PheX_3) resulting in an eventual decline in the ADI (Figures 1d, 2d for PheX_3). For example, the average ADI of the ThrX_3 probe set at 1 nM was approximately 47% higher than at 10 nM (Table 1). Probe-target binding can be characterized by sigmoidal dose-response curves with different parameters. While both PM and MM seem to have comparable maximum slopes, due to lower target affinity, the linear response phase of the MM signal is shifted from that of the PM signal (it occurs at higher concentrations). Consequently, the ADI decreases for high target concentrations where the slope of the PM sigmoidal has tapered. Reports that the ADI saturates simply as a result of saturation of both the MM and PM signals  do not seem to be supported by our results. It should be noted that part of the observed saturation effect is attributable to limited scanner sensitivity, which might reflect a limitation of our installation. Scanner-related saturation is functionally related to absolute intensity, which, in turn, is dependent on underlying probe kinetics. Examination of individual probe signals suggests that scanner effects are likely to be more pronounced at higher target concentrations (1-10 nM), as many of the probes have reached the maximum detectable intensity (around 46,000). However, saturation in the 10-100 pM range is most likely to be dominated by the kinetic properties of the PM and MM probes.
At the lower end of transcript concentration, it seems that eukaryotic background affects the PM and MM probes nonsymmetrically, generally resulting in a decrease in the ADI. Notably, the DapX_M probe set in Table 1 has a positive ADI for 0.1 pM level even though the MM signal is greater than the PM signal before outlier removal. Additionally, the same outlier-removal algorithm rendered the small 0.1 pM ADI for the PheX_3 nonexistent, suggesting that simple heuristics can fail in a low signal-to-noise environment. Outlier classification is perhaps best addressed using a functional (kinetic), as opposed to a purely inferential, profile constructed from a large repository of experimental data. Given the large number of experiments used to derive the ADI, such an approach would seem eminently feasible. In lieu of an ideal functional profile, the use of alternative heuristics may be promising. For example, PM+MM-background (PM+MM also mentioned in ), while being less sensitive in the 1-10 pM range, was more sensitive for higher ranges and more robust for 0.1-1 pM.
Given that a significant number of genes of biological interest have transcript frequencies at or below 1 pM , the commercial usefulness of the system is constrained by the minimum abundance level that is reliably detectable. A current limit of 1 in 2 × 106 transcripts, that is, around 0.075 pM or 1 in 7 cells) has been reported . Indeed, the Affymetrix GeneChip® array was able to detect this very low transcript level in the absence of eukaryotic background. However, after addition of cRNA background, transcripts at the 0.1 pM concentration became essentially undetectable for all 12 independent transcripts, whereas 1 pM transcripts remained robustly detected. Our results are consistent with a recent study  showing a range of detection between 1 in 300,000 and 1 in 50,000 (0.5-1.5 pM). It is possible to argue that post-hybridization amplification would improve detection, but obviously at the expense of potentially saturating expression levels of more abundant genes. Perhaps scanning images before and after amplification could maximize detection without suffering saturation penalties. Longer hybridization cycles seem to be a viable alternative, as these enabled partial detection of transcripts (about 5 out of 12) at the 0.1 pM level without significantly affecting high-end intensities. Given the current inability to localize transcripts with similar abundances on different arrays, care should be taken to ensure a sufficient overall number of replicates to obtain a small standard error for 0.1 pM level measurements. Also, new methods of mRNA amplification  that robustly increase the hybridization concentration of rare transcripts might improve detectability but could simultaneously distort the underlying RNA expression profile. The addition of eukaryotic background had a profound effect on the properties of thermodynamic equilibrium of probe-target binding. The decrease in 10 pM and 100 pM intensities suggests the presence of complex, stable interactions (that is, cross-target binding) which persist subsequent to the 32-hour hybridization. A similar hypothesis was previously proposed  where probe-probe interactions were implicated. We believe that cross-target interactions are more important in our case as the saturation level was not affected by cRNA background. The marginal increase in the signal after 32 hours of hybridization, though, suggests that thermodynamic equilibrium was not quite reached in 16 hours. The increase in signal for 0.1 pM is obviously due to increased background noise. The net result of cRNA background is that it has a nonlinear effect on the ADIs (Figure 3), which is impossible to compensate for using linear methods.
Our findings suggest that while high-density microarrays are a convenient way of monitoring thousands of genes simultaneously, increased care is needed in the design of experiments and scrutiny of the predictive utility of the numerical measure used to represent gene expression. An increase in the number of replicates is preferable to reliance upon magnitudes of fold changes, as the latter is not always linearly related to target concentration and is extremely variable for low transcript concentrations. In light of the fact that the MM signal predominantly characterizes interactions with the target transcript, a different heuristic used to weight the MM signal accordingly for each probe pair might further improve the quality of Affymetrix GeneChip® array data. Perhaps such an approach underlies the announced supplanting of historical algorithms in the impending GeneChip 5.0 release. Finally, longer hybridization times can improve partial detection of transcripts expressed at very low levels.
Materials and methods
Preparation of labeled targets for hybridization
The Test2 array (Affymetrix, Santa Clara, CA) contains probes corresponding to commonly expressed genes from the human, mouse, rat and yeast genomes, along with several prokaryotic control genes. For each of these genes, probes derived from the 5', middle and 3' portions of the genes are arrayed. The prokaryotic controls used in this study contain engineered poly(A)+ tails and are available through the American Type Culture Collection  (dapB, ATCC 87486; lysA, ATCC 87482; pheB, ATCC 87483; and thrC, ATCC 87484). Methods for preparing cRNA and subsequent steps leading to hybridization and scanning of the Test2 arrays were provided by the manufacturer. Briefly, amplified and purified prokaryotic control vectors were linearized at the 5' end using XhoI and purified by gel electrophoresis. Poly(A)+ cDNA was transcribed in vitro by incorporation of biotinylated CTP and UTP (Enzo Diagnostics, Farmingdale, NY) using a BioArray High Yield RNA Transcript Labeling kit according to the manufacturer's instructions. The labeled cRNA was purified using RNeasy spin columns (Qiagen, Chatsworth, CA), followed by DNase I treatment and a second round of RNeasy spin-column purification. The integrity of all labeled and purified transcript was checked by denaturing gel electrophoresis. Each of the four transcripts, dap, lys, phe and thr, were pooled, and fragmented in fragmentation buffer (40 mM Tris-acetate pH8.1, 100 mM potassium acetate, 30 mM magnesium acetate), and brought up in hybridization mix according to the manufacturer's protocols (1 M NaCl, 10 mM Tris-acetate pH8.1, 0.01% Triton-X 100, 100 μg/ml herring sperm DNA, and 50 pM biotinylated control oligo 948) such that each of the combined transcripts would yield a final concentration of 10 nM. All subsequent serial dilutions were carried out using components of the hybridization buffer to yield final concentrations of 1 nM, 100 pM, 10 pM, 1 pM and 0.1 pM.
For all experiments simulating a complex background of eukaryotic transcripts, 20 μg total human brain RNA (Clontech, Palo Alto, CA) was used and processed according to Affymetrix protocols. Subsequent to in vitro transcription, 4 μg labeled cRNA was included into a final volume of 80 μl hybridization mix per array, also containing the labeled prokaryotic transcripts.
Array hybridization and scanning
Before hybridization, the cRNA samples were heated to 94°C for 5 min, equilibrated to 40°C for 5 min, and clarified by centrifugation at 14,000 g for 5 min. Aliquots of the pooled samples and tenfold dilutions of the pool in hybridization buffer were hybridized to Test2 arrays at 45°C for 16 h in a rotisserie oven set at 60 rpm. Subsequently, the arrays were washed with non-stringent wash buffer (0.9 M NaCl, 60 mM NaH2PO4, 6 mM EDTA, 0.01% Tween-20) at 25°C followed by stringent wash buffer (100 mM methyl ethyl sulfonate (MES), 100 mM NaCl, 0.01% Tween-20) at 50°C in a fluidics station (Affymetrix). The probe arrays were then stained by adding 80 μl staining solution (100 mM MES, 1 M NaCl, 0.05% Tween-20 with 2 mg/ml acetylated BSA (Sigma, St Louis, MO) and 10 μg/ml streptavidin R-phycoerythrin (Molecular Probes, Portland, OR) for 30 min at 25°C. After staining, the arrays were rinsed with non-stringent wash buffer at 25°C and scanned using a GeneArray confocal scanner (Agilent, Palo Alto, CA).
To evaluate the influence of hybridization duration on intensity values, the 0.1 pM, 1 pM and 10 pM prokaryotic target concentrations were repeated in experiments where the hybridization time was extended to 32 h with all other conditions left unchanged.
Data analysis was performed using the Affymetrix GeneChip array 4.0 software. Four chips representing one target concentration level (that is, 10 pM) were paired with four chips from the following target concentration level (that is, 1 pM) resulting in 16 total pairs. For each pair, the data was multiplied by a normalization factor (calculated with a mask excluding prokaryotic target transcripts) to make the average signal for both arrays equivalent. Fold changes were subsequently averaged across different pairs, excluding fold changes involving negative ADIs. All normalization factors were within 10% of 1. Statistical curve fitting was carried out using GraphPad Prism 2.01. Sigmoidal dose-response curves were fitted to PM and MM data, allowing all four parameters to be variable. The ADI was fitted with a 100 segment cubic spline.
We thank Michael Liebman for his support of this project and Paul Cannon, Al Baecker and Debra Cockayne for their helpful suggestions.
Nat Biotechnol 1996, 14:1675-1680. PubMed Abstract
Nat Biotechnol 1997, 15:1359-1367. PubMed Abstract