Table 1 |
||||||||
|
Estimates of a whole-genome sequence and an RNA-seq experiment using Illumina HiSeq 2000 machine |
||||||||
|
Whole genome sequencing |
RNA-Seq |
|||||||
|
2011 cost |
output |
2011 time |
2011 cost |
output |
2011 time |
|||
|
|
||||||||
|
Sample collection and experimental design |
from blood samples (easy to collect) to brain tissue (hard to collect) |
~$100 onwards |
from a few hours to several days |
same as for whole genome sequencing |
||||
|
Sequencing |
library preparation + running the sequencer (whole dual flow cell) |
~$6500 = ~$500 + ~$6000 |
~380 M reads/lane; 1 individual: ~1140 M total reads (~3 lanes for a 30 × coverage); ~250 Gb (intermediate files) |
~11-12 day |
library preparation + running the sequencer (whole dual flow cell) |
~$3300 = ~$300 + ~$3000 |
~380 M reads/lane |
~12-14 day |
|
Data storage, low-level processing |
||||||||
|
Alignment (transfer* and storing raw data + mapping) |
~$40 = ~$33 + ~$7 |
300 Gb (BAM file) |
~1/2 day *** (including transferring 250 Gb FASTQ ~7.5 hrs) |
Alignment (transfer* and storing raw data + mapping) |
~$5 = ~$3 + ~$2 |
~30 Gb (BAM); ~22 Gb (MRF) |
< 2 hrs *** |
|
|
(data transfer and storage for 10 days)*; ** |
~$40 |
~8.5 hrs |
(data transfer and storage for 10 days)*; ** |
< $4 |
< 1 hr |
|||
|
Data reduction and management |
High-level summaries*** |
|||||||
|
SNP calling (compute + transfer out) |
< $5 = ~$4 + ~$0.60 |
< 1 Gb |
~3 hrs |
Gene and exon expression quantification |
< $1 |
< 1 M |
< 1 hr (1 CPU) |
|
|
Indel calling (compute + transfer out) |
< $35 = ~$32 + ~$0.60 |
< 1 Gb |
~1 day |
Isoform quantification |
~$6 |
< 1 M |
~4 h |
|
|
SV calling (compute + transfer out) |
< $35 = ~$32 + ~$0.60 |
< 1 Gb |
~1 day |
|||||
|
Downstream analyses |
> $100 K |
~310 Gb |
months |
> $100 K |
~30 Gb |
months |
||
|
Total of sequencing, data management and reduction |
~$6500 |
~310 Gb |
~15 days |
~3500 |
~30 Gb |
~12-14 days |
||
|
|
||||||||
|
aAssuming a 10 MB/s transfer rate. bThe cost of transferring 300 GB (BAM file) if the mapping is performed locally. cSixteen CPUs were used for all calculations, unless specified. BAM, Binary Sequence Alignment/Map; CPU, central processing unit; GB, gigabyte; MB, megabyte; SNP, single nucleotide polymorphism; ~, approximately. The table gives an estimation of the current costs of a whole-genome sequencing experiment and a functional genomic experiment (RNA-seq) using an Illumina HiSeq 2000 machine. The cost of sequencing is based on that reported by the Center for Cancer Research [53]. It is assumed that the same human sample is used for the genome sequence as well as for the RNA-seq experiment. In this scenario, a group of four technicians and bioinformaticians can run the entire pipeline. The costs related to data processing are estimated by considering all tasks performed in the Amazon Web Services 'cloud' environment. Pricing is based on a 'US standard' of the S3 Amazon Services (storage) [54] and a cost of $0.68 per hour (US, East Virginia) for the use of the Amazon EC2 (computation) [55] (July 2011). See Additional file 1 for a version of this table in a colored layout for easier reading. |
||||||||
|
Sboner et al. Genome Biology 2011 12:125 doi:10.1186/gb-2011-12-8-125 |
||||||||