Table 1

Estimates of a whole-genome sequence and an RNA-seq experiment using Illumina HiSeq 2000 machine

Whole genome sequencing

RNA-Seq

2011 cost

output

2011 time

2011 cost

output

2011 time


Sample collection and experimental design

from blood samples (easy to collect) to brain tissue (hard to collect)

~$100 onwards

from a few hours to several days

same as for whole genome sequencing

Sequencing

library preparation + running the sequencer (whole dual flow cell)

~$6500 = ~$500 + ~$6000

~380 M reads/lane; 1 individual: ~1140 M total reads (~3 lanes for a 30 × coverage); ~250 Gb (intermediate files)

~11-12 day

library preparation + running the sequencer (whole dual flow cell)

~$3300 = ~$300 + ~$3000

~380 M reads/lane

~12-14 day

Data storage, low-level processing

Alignment (transfer* and storing raw data + mapping)

~$40 = ~$33 + ~$7

300 Gb (BAM file)

~1/2 day *** (including transferring 250 Gb FASTQ ~7.5 hrs)

Alignment (transfer* and storing raw data + mapping)

~$5 = ~$3 + ~$2

~30 Gb (BAM); ~22 Gb (MRF)

< 2 hrs ***

(data transfer and storage for 10 days)*; **

~$40

~8.5 hrs

(data transfer and storage for 10 days)*; **

< $4

< 1 hr

Data reduction and management

High-level summaries***

SNP calling (compute + transfer out)

< $5 = ~$4 + ~$0.60

< 1 Gb

~3 hrs

Gene and exon

expression quantification

< $1

< 1 M

< 1 hr (1 CPU)

Indel calling (compute + transfer out)

< $35 = ~$32 + ~$0.60

< 1 Gb

~1 day

Isoform quantification

~$6

< 1 M

~4 h

SV calling (compute + transfer out)

< $35 = ~$32 + ~$0.60

< 1 Gb

~1 day

Downstream analyses

> $100 K

~310 Gb

months

> $100 K

~30 Gb

months

Total of sequencing, data management and reduction

~$6500

~310 Gb

~15 days

~3500

~30 Gb

~12-14 days


aAssuming a 10 MB/s transfer rate. bThe cost of transferring 300 GB (BAM file) if the mapping is performed locally. cSixteen CPUs were used for all calculations, unless specified. BAM, Binary Sequence Alignment/Map; CPU, central processing unit; GB, gigabyte; MB, megabyte; SNP, single nucleotide polymorphism; ~, approximately. The table gives an estimation of the current costs of a whole-genome sequencing experiment and a functional genomic experiment (RNA-seq) using an Illumina HiSeq 2000 machine. The cost of sequencing is based on that reported by the Center for Cancer Research [53]. It is assumed that the same human sample is used for the genome sequence as well as for the RNA-seq experiment. In this scenario, a group of four technicians and bioinformaticians can run the entire pipeline. The costs related to data processing are estimated by considering all tasks performed in the Amazon Web Services 'cloud' environment. Pricing is based on a 'US standard' of the S3 Amazon Services (storage) [54] and a cost of $0.68 per hour (US, East Virginia) for the use of the Amazon EC2 (computation) [55] (July 2011). See Additional file 1 for a version of this table in a colored layout for easier reading.

Sboner et al. Genome Biology 2011 12:125   doi:10.1186/gb-2011-12-8-125