Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Highly Accessed Method

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts

Daehwan Kim1* and Steven L Salzberg123

Author Affiliations

1 Center for Bioinformatics and Computational Biology, 3115 Biomolecular Sciences Building #296, University of Maryland, College Park, MD 20742, USA

2 McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Broadway Research Building, 733 N Broadway, Baltimore, MD 21205, USA

3 Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA

For all author emails, please log on.

Genome Biology 2011, 12:R72  doi:10.1186/gb-2011-12-8-r72

Published: 11 August 2011

Additional files

Additional file 1:

Table S1 - 76 candidate fusions including multiple fusion points in the breast cancer cell lines. Additional details for the 76 fusions detected by TopHat-Fusion in the breast cancer cell lines (BT474, SKBR3, KPL4, MCF7). Some of the genes contain multiple fusion points, presumably due to alternative splicing.

Format: PDF Size: 26KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Table S2 - 19 candidate fusions including multiple fusion points in the prostate cancer cell line. Nineteen fusion genes detected by TopHat-Fusion in a prostate cancer cell line (VCaP), including several with multiple fusion points due to alternative splicing.

Format: PDF Size: 17KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Figure S1 - read distributions around BCR-ABL1 fusion for single-end and paired-end reads. This figure shows read distributions around the BCR-ABL1 fusion gene in Universal Human Reference (UHR) data. (a) The read distribution for single-end reads (100 bp or less). (b) Read distribution for paired-end reads (50 bp) from 300-bp fragments. Coverage was similar with either data set.

Format: PDF Size: 30KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Table S3 - the top 20 fusion candidates reported by TopHat-Fusion in the UHR data. The top 20 fusion genes from the Universal Human Reference (UHR) data found by TopHat-Fusion, sorted by the scoring scheme described in Figure 6. Single- and paired-end reads were used separately in order to compare TopHat's ability to find fusions using only single-end reads.

Format: PDF Size: 17KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Table S4 - 45 fusion candidates reported by TopHat-Fusion in Illumina Body Map 2.0 data. Using two samples (testes and thyroid) from Illumina Body Map 2.0 data, TopHat-Fusion reports 45 fusions.

Format: PDF Size: 21KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

List of 14,510 fusion candidates reported by FusionSeq for MCF7 sample data.

Format: TXT Size: 19.5MB Download file

Open Data

Additional file 7:

Table S5 - 42 fusion candidates reported by TopHat-Fusion in SKBR3 and MCF7 cell lines. Twenty-eight and fourteen candidate fusions are reported in SKBR3 and MCF7 samples, respectively, when the filtering parameters are changed to one spanning read and two supporting mate pairs.

Format: PDF Size: 19KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

List of 275 fusion candidates reported by deFuse in MCF7 sample data.

Format: TXT Size: 180KB Download file

Open Data

Additional file 9:

List of 1,395 fusion candidates reported by deFuse in SKBR3 sample data.

Format: TXT Size: 1MB Download file

Open Data

Additional file 10:

Supplementary methods.

Format: DOC Size: 38KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 11:

Figure S2 - Finding fusions using two segments and partner reads in paired-end reads. (a) TopHat allows one to three mismatches when mapping segments using Bowtie, which enables segments to be mapped even if a few bases cross a fusion point (the last two bases of the red segment, GG). These two segments, mapped to two different chromosomes, are used to identify a fusion point. (b) For paired-end reads, the mapped position of the partner read is used to narrow down the range of a fusion point. The second segment (shown in green) cannot be mapped because it spans a fusion point. Here, its partner read is mapped and the fusion point is likely to be located within the inner mate distance ± standard deviation of the left genomic coordinate of the partner read. TopHat-Fusion is able to use this relatively small range to efficiently map the right part of the second segment to the right side of a fusion (case 2). The left part of the second segment is aligned to the right side of the mapped first segment (case 3).

Format: PDF Size: 33KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

Figure S3 - stitching segments to produce a full read alignment. (a) The segment in the third row for segment 1 and the one in the first row for segment 2 are connected because they are on the same chromosome (i) in the forward direction and with adjacent coordinates. These are then matched to the second row in segment 3 and glued together, producing the full-length read alignment at the bottom. (b) TopHat-Fusion tries to connect the segment in the second row for segment 1 with segments in the first and second rows for segment 2, but neither succeeds. Case 1 would require two fusion points in the same read, and case 2 cannot be fused with consistent coordinates. (c) Attempts to connect the segment in the second row for segment 2 with the one in the first row in segment 3: in case 3, there is no intron available, there is no fusion in case 4, and case 5 would require more than one fusion.

Format: PDF Size: 30KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data