RNA Fusion — step2 FusionCatcher
一、概述
参考文献:https://www.biorxiv.org/content/biorxiv/early/2014/11/19/011650.full.pdf
github地址:https://github.com/ndaniel/fusioncatcher
二、软件安装
安装命令在github上已经给出:
git clone https://github.com/ndaniel/fusioncatcher
cd fusioncatcher/tools/
./install_tools.sh
cd ../data
./download-human-db.sh
其中,由于install_tools.sh这个脚本需要安装的软件太多,一次运行的话,经常会报错,可以把它拆成单个软件,分别安装:
cd /Fusioncatcherdir/fusioncatcher/tools && wget https://github.com/BenLangmead/bowtie/releases/download/v1.2.3/bowtie-1.2.3-linux-x86_64.zip -O bowtie-1.2.3-linux-x86_64.zip --no-check-certificate && unzip bowtie-1.2.3-linux-x86_64.zip && ln -s bowtie-1.2.3-linux-x86_64 bowtie
cd /Fusioncatcherdir/fusioncatcher/tools && wget https://github.com/BenLangmead/bowtie2/releases/download/v2.3.5.1/bowtie2-2.3.5.1-linux-x86_64.zip -O bowtie2-2.3.5.1-linux-x86_64.zip --no-check-certificate && unzip bowtie2-2.3.5.1-linux-x86_64.zip && ln -s bowtie2-2.3.5.1-linux-x86_64 bowtie2
cd /Fusioncatcherdir/fusioncatcher/tools && wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.9.6/sratoolkit.2.9.6-centos_linux64.tar.gz -O sratoolkit.2.9.6-centos_linux64.tar.gz --no-check-certificate && tar --overwrite -xvzf sratoolkit.2.9.6-centos_linux64.tar.gz && ln -s sratoolkit.2.9.6-centos_linux64 sratoolkit
cd /Fusioncatcherdir/fusioncatcher/tools && wget http://zlib.net/pigz/pigz-2.4.tar.gz -O pigz-2.4.tar.gz --no-check-certificate && tar --overwrite -xvzf pigz-2.4.tar.gz && make -C pigz-2.4 && ln -s pigz-2.4 pigz
cd /Fusioncatcherdir/fusioncatcher/tools && mkdir liftover && wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver -O liftover/liftOver --no-check-certificate && chmod +x liftover/liftOver
cd /Fusioncatcherdir/fusioncatcher/tools && mkdir blat && wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/blat -O blat/blat --no-check-certificate && chmod +x blat/blat
cd /Fusioncatcherdir/fusioncatcher/tools && mkdir fatotwobit && wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faToTwoBit -O fatotwobit/faToTwoBit --no-check-certificate && chmod +x fatotwobit/faToTwoBit
cd /Fusioncatcherdir/fusioncatcher/tools && wget http://github.com/ndaniel/seqtk/archive/1.2-r101c.tar.gz -O 1.2-r101c.tar.gz --no-check-certificate && tar --overwrite -xvzf 1.2-r101c.tar.gz -C . && make -C seqtk-1.2-r101c && chmod +x seqtk-1.2-r101c/seqtk && ln -s seqtk-1.2-r101c seqtk
cd /Fusioncatcherdir/fusioncatcher/tools && wget https://github.com/alexdobin/STAR/archive/2.7.2b.tar.gz -O 2.7.2b.tar.gz --no-check-certificate && tar --overwrite -xvzf 2.7.2b.tar.gz -C . && cp -f STAR-2.7.2b/bin/Linux_x86_64_static/STAR STAR-2.7.2b/source/STAR && ln -s STAR-2.7.2b star
cd /Fusioncatcherdir/fusioncatcher/tools && wget https://sourceforge.net/projects/bbmap/files/BBMap_38.44.tar.gz -O BBMap_38.44.tar.gz --no-check-certificate && tar --overwrite -xvzf BBMap_38.44.tar.gz -C . && mv bbmap BBMap_38.44 && ln -s BBMap_38.44 bbmap && chmod +x bbmap/*.sh
cd /Fusioncatcherdir/fusioncatcher/tools && mkdir picard && wget https://github.com/broadinstitute/picard/releases/download/2.21.2/picard.jar -O picard/picard.jar --no-check-certificate && chmod +x picard/picard.jar
download-human-db.sh这一步是下载人类数据库,也占用不少时间。
三、使用方法
/Fusioncatcherdir/fusioncatcher/bin/fusioncatcher -d /Fusioncatcherdir/fusioncatcher/data/current/ -i /SampleFQ/ -o /Outdir/
得到:
-rw-r--r-- 1 test test 21927 Oct 30 14:19 final-list_candidate-fusion-genes.caption.md.txt
-rw-r--r-- 1 test test 3696 Oct 30 14:19 final-list_candidate-fusion-genes.hg19.txt ###最终结果
-rw-r--r-- 1 test test 19628 Oct 30 14:19 final-list_candidate-fusion-genes_sequences.txt.zip
-rw-r--r-- 1 test test 3696 Oct 30 14:19 final-list_candidate-fusion-genes.txt
-rw-r--r-- 1 test test 10056 Oct 30 14:19 final-list_candidate-fusion-genes.vcf
-rw-r--r-- 1 test test 417611 Oct 30 14:19 fusioncatcher.log
-rw-r--r-- 1 test test 2371485 Oct 30 14:19 info.txt
-rw-r--r-- 1 test test 13144 Oct 30 14:19 junk-chimeras.txt
-rw-r--r-- 1 test test 806 Oct 30 14:19 summary_candidate_fusions.txt
-rw-r--r-- 1 test test 11794 Oct 30 14:17 supporting-reads_gene-fusions_BLAT.zip
-rw-r--r-- 1 test test 160764 Oct 30 14:15 supporting-reads_gene-fusions_BOWTIE.zip
-rw-r--r-- 1 test test 22 Oct 30 14:18 supporting-reads_gene-fusions_STAR.zip
-rw-r--r-- 1 test test 112 Oct 30 14:13 viruses_bacteria_phages.txt
四、结果解读
$ cat final-list_candidate-fusion-genes.hg19.txt
Gene_1_symbol(5end_fusion_partner) Gene_2_symbol(3end_fusion_partner) Fusion_description Counts_of_common_mapping_reads Spanning_pairs Spanning_unique_reads Longest_anchor_found Fusion_finding_method Fusion_point_for_gene_1(5end_fusion_partner) Fusion_point_for_gene_2(3end_fusion_partner) Gene_1_id(5end_fusion_partner) Gene_2_id(3end_fusion_partner) Exon_1_id(5end_fusion_partner) Exon_2_id(3end_fusion_partner) Fusion_sequence Predicted_effect
BCAS4 BCAS3 known,chimer2,cancer,tumor,tcga-cancer,mitelman,exon-exon 0 102 19 24 BOWTIE 20:49411710:+ 17:59445688:+ ENSG00000124243 ENSG00000141376 ENSE00001952820 ENSE00003624722 GCTCGCGCTCTTCCTGACCCCCGAGCCTGGGGCCGAG*GTACCTTTGACAGGAGCGTGACCCTGCTGGAGGTGTG out-of-frame
BCAS4 BCAS3 known,chimer2,cancer,tumor,tcga-cancer,mitelman,exon-exon 0 102 4 24 BOWTIE 20:49411710:+ 17:59430949:+ ENSG00000124243 ENSG00000141376 ENSE00001952820 ENSE00002919761 GCTCGCGCTCTTCCTGACCCCCGAGCCTGGGGCCGAG*GGGTGTTGTGAGGATTCATGGAGCAAATGGCTGTGAA CDS(truncated)/UTR
ARFGEF2 SULF2 known,cell_lines,cancer,mitelman,t2,exon-exon 0 13 11 25 BOWTIE 20:47538547:+ 20:46365686:- ENSG0000012419ENSG00000196562 ENSE00001456459 ENSE00003617788 CACTCCCAGCTGCGCAGGGCCTGCCAGGTGGCGCTCG*GTTCCATGCAGGTGATGAACAAGACCCGGCGCATCAT in-frame
RPS6KB1 VMP1 known,ribosomal,chimer2,tcga,cell_lines,18cancers,tcga-cancer,mitelman,t3,oesophagus,10K<gap<100K,exon-exon 0 5 5 24 BOWTIE 17:57992064:+ 17:57917129:+ ENSG00000108443 ENSG00000062716 ENSE00003567383 ENSE00002943049 TACTGGGAAAATATTTGCCATGAAGGTGCTTAAAAAG*GGAGAAAACTGGTTGTCCTGGATGTTTGAAAAGTTGG in-frame
主要关注的是下面几列:
Gene_1_symbol(5end_fusion_partner):融合基因的上游基因
Gene_2_symbol(3end_fusion_partner) :融合基因的下游基因
Spanning_pairs:支持融合的Spanning_pairs reads数目
Spanning_unique_reads:支持融合的uniq reads数目
Fusion_point_for_gene_1(5end_fusion_partner):融合基因的上游基因的断点位置
Fusion_point_for_gene_2(3end_fusion_partner):融合基因的下游基因的断点位置
然后根据这几列,对融合基因进行筛选(一般fusioncatcher得到的结果非常多,需要对Spanning_pairs和Spanning_unique_reads列进行条件筛选),区域注释(用annovar即可,可以注释到基因的转录本及外显子区域)。