|
Resolution: standard / high Figure 1.
Comparison of mRNA expression and protein abundance. (a) A plot comparing our mRNA reference expression set [29] with our newly compiled protein abundance dataset. The mRNA axis is in copies per
cell; the protein axis is in thousand copies per cell. The protein dataset is the
result of iteratively fitting two MudPit datasets (MudPit-1 [32] and MudPit-2 [31]) and two two-dimensional electrophoresis datasets (2DE-1 [7] and 2DE-2 [28]). Given the semi-quantitative nature of the MudPit data [31], we transformed the data into a more quantitative set by fitting each set individually
onto our reference mRNA expression dataset. In addition, we fit the MudPit-1 dataset
onto the more finely-grained MudPit-2 dataset. Each of the datasets was then moved
back into 'protein space' using an inverse transformation derived from the 2DE-1 set,
as this set has the most precise values. These datasets were then combined into the
new reference abundance dataset. In cases in which there were overlapping values for
a given ORF we used the dataset in accord with the following ordering: 2DE-1, 2DE-2,
MudPit-2, MudPit-1. The resulting reference protein abundance dataset (N = 2044) had a correlation of 0.66 with the mRNA reference dataset. (b,c) Additionally, we show that when looking at specific subsets (subcellular localization
[52] or functional groups [34,35]) we can find both higher and lower correlations amongst these groups. The lower correlations
are generally reflective of a more heterogeneous category. This analysis indicates
that while correlations may be weak when looking at the global data, we tend to find
higher correlations when looking at smaller well-defined subsets of ORFs. Further
analysis is available at [33].
Greenbaum et al. Genome Biology 2003 4:117 doi:10.1186/gb-2003-4-9-117 |