Open Access Highly Accessed Method

SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data

Wenlong Jia12, Kunlong Qiu12, Minghui He12, Pengfei Song2, Quan Zhou123, Feng Zhou24, Yuan Yu2, Dandan Zhu2, Michael L Nickerson5, Shengqing Wan12, Xiangke Liao6, Xiaoqian Zhu67, Shaoliang Peng67, Yingrui Li12, Jun Wang1289 and Guangwu Guo12*

Author Affiliations

1 BGI Tech Solutions Co., Ltd, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China

2 BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China

3 School of Life Science and Technology, University of Electronic Science and Technology of China, No.4, Section 2, North Jianshe Road, Chengdu 610054, China

4 School of Bioscience and Bioengineering, South China University of Technology, Guangzhou Higher Education Mega Centre, Panyu District, Guangzhou 510006, China

5 Cancer and Inflammation Program, National Cancer Institute, National Institutes of Health, 1050 Boyles Street, Frederick, MD 21702, USA

6 School of Computer Science, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan 410073, China

7 State Key Laboratory of High Performance Computing, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan 410073, China

8 The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, DK-1165 Copenhagen, Denmark

9 Department of Biology, University of Copenhagen, DK-1165 Copenhagen, Denmark

For all author emails, please log on.

Genome Biology 2013, 14:R12  doi:10.1186/gb-2013-14-2-r12

Published: 14 February 2013

Additional files

Additional file 1:

Tables S1 - information on all known fusions from two previous studies. Additional detailed information on the known fusions in two previous studies (melanoma and breast cancer researches). All information of fusions is based on release 59 of the Ensembl hg19 annotation database.

Format: XLSX Size: 13KB Download file

Open Data

Additional file 2:

Table S2 - software selected for evaluation of performance and sensitivity.

Format: XLSX Size: 10KB Download file

Open Data

Additional file 3:

Supplementary notes.

Format: DOCX Size: 440KB Download file

Open Data

Additional file 4:

Table S3 - detailed information on performance and fusion detection sensitivity of six tools. CPU time, maximum memory usage and sensitivity of fusion detection for each tool are shown. For the multiple process operations, CPU time has been translated to single process usage.

Format: XLSX Size: 10KB Download file

Open Data

Additional file 5:

Table S4 - detection screen of six tools on two previous study datasets.

Format: XLSX Size: 14KB Download file

Open Data

Additional file 6:

Tables S5, S6 and S7. Table S5: detailed information on simulated RNA-Seq reads. Table S6: list of 150 simulated fusion events. Table S7: number of fusion-supporting reads for each fusion event.

Format: XLSX Size: 52KB Download file

Open Data

Additional file 7:

Tables S8 and S9. Table S8: TP and FP rates of SOAPfuse, deFuse and TopHat-Fusion based on simulated datasets. Table S9: detailed information on the simulated fusion events detected by SOAPfuse, deFuse and TopHat-Fusion.

Format: XLSX Size: 27KB Download file

Open Data

Additional file 8:

Tables S10 and S11. Table S10: fusion transcripts detected by SOAPfuse and deFuse in two bladder cancer cell lines. Table S11: primers and Sanger sequences of confirmed fusions in two bladder cancer cell lines.

Format: XLSX Size: 16KB Download file

Open Data

Additional file 9:

Figure S1 - models of fusion transcripts generated by genome rearrangement. (a) Fusion transcript created by genomic inversion of Gene A and Gene B, which are from different DNA strands. (b) Fusion transcript formed by genomic translocation in which Gene C and Gene D are from the same DNA strand and are far from each other.

Format: PDF Size: 1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Figure S2 - schematic diagrams of nine steps in the SOAPfuse pipeline. The SOAPfuse algorithm consists of nine steps (from S01 to S09) and details of each step are in the Materials and methods or Additional file 3.

Format: PDF Size: 230KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Table S12 - sixteen combination of span-read. There are sixteen combinations based on serial numbers of reads and their mapped orientations, but only four combinations are rational, supporting two types of fusions in which the upstream and downstream genes are different.

Format: XLSX Size: 10KB Download file

Open Data

Additional file 12:

Figure S3 - schematic diagrams of fusion event RECK-ALX3. (a) Alignment of supporting reads against the predicted junction sequence. The upstream part of the junction sequence is in green, and the downstream part is in red. Span-reads are displayed above the predicted junction sequence with the colored dotted line linking paired-end reads. Junc-reads are shown below the junction sequence. (b,c) Expression analysis of the exons in RECK and ALX3 by RNA-Seq read coverage. Transcripts of RECK and ALX3 are shown below the coordinates. The junction site is shown as a red round dot and a green arrow indicates the transcript orientation in the genome sequence. The region covered by the red line is the region mapped by supporting reads. In this case, we found that the expression levels of RECK and ALX3 exons at bilateral sides of junction sites are significantly different. The exons involved in the fusion transcript are expressed more highly than other ones.

Format: ZIP Size: 165KB Download file

Open Data