RNA 16. SCI 文章中的融合基因之可视化
前言
上期我们聊了一下数据库,可以直接使用,但是有的时候我们还是需要对新数据进行融合基因的检测,一般来说 RNA 和 DNA 都可以做融合的检测,但是为什么官方更倾向基于 RNA 进行融合检测呢?
一、 融合基因检测
我们先看融合基因示意图,例子选择的是血液病的融合基因 BCR-ABL,该融合是由于 9,22号染色体的易位,导致疾病的发生。
基因融合常见的三种发生机制:
1)Chromosomal Translocation,染色体易位。如下图A中1号和2号染色体上的两片段发生交叉互换,导致1号染色体上的浅绿色基因与2号染色体上的橘黄色基因融合到一起;
2)Interstitial deletion,中间缺失。如下图中,3号染色体上的橘黄色基因和浅绿色基因之间的区段发生缺失(deletion),最终导致这两个基因融合到了一起;
3)Chromosomal Inversion,染色体倒位。如下图4号染色体上的橘黄色基因到墨绿色基因之间的片段发生倒位,最终导致橘黄色基因和浅绿色基因融合到了一起。
二、 融合基因检测
基于高通量测序的融合基因检测主要包括以下三步:
先将reads通过STAR比对到参考基因组,筛选出split和discordant reads作为候选的融合基因序列;
将候选融合基因序列与参考基因序列进行比对,根据overlaps预测出融合基因;
对预测结果做过滤,去除假阳性结果。
我们这里选择 Aviv Regev 实验室开发的 STAR-Fusion 软件,主要是因为这款软件后续也开发了一些可视化的软件,方便我们讲解。
1. 下载测试的数据
我们首先下次测试数据,这个 STAR-Fusion 检测融合基因从原始的paried-end Reads 开始分析,
https://codeload.github.com/STAR-Fusion/STAR-Fusion-Tutorial/zip/refs/tags/v0.0.1
下载后,看下压缩包里面的文件,包括原始数据,测试的shell脚本以及基因组文件(.fa)和基因注释文件 (.gtf)
2. 软件安装
软件包安装两种方式,一种 git 之间安装,但是这种需要考虑网速,另一种就是下载本地自己安装,make一下既可以。
Installing from GitHub Clone:
% git clone --recursive https://github.com/STAR-Fusion/STAR-Fusion.git
b. Downloading a STAR-Fusion Release (Preferred)
Visit https://github.com/STAR-Fusion/STAR-Fusion/releases
下载之后解压 .tar.gz 文件, 进入目录,在 'make',即可以完成安装。
3. 建立reference lib
首先需要建立参考基因组对应的reference lib, 至少需要参考基因组对应的fasta
文件和gtf
文件,另外还可以提供已有的融合基因的注释等。
对于human
和mouse
而言,提供了已经构建好的文件,链接如下:
https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/
文件目录如下:
建库如下:
FusionFilter/prep_genome_lib.pl \
--genome_fa ref_genome.fa \
--gtf ref_annot.gtf \
--fusion_annot_lib CTAT_HumanFusionLib.dat.gz \
--annot_filter_rule AnnotFilterRule.pm \
--pfam_db PFAM.domtblout.dat.gz
4. 软件检测过程
利用建好库,以及下载的fastq 文件既可以运行 STAR-Fusion,如下:
cd testing/
STAR-Fusion --left_fq reads_1.fq.gz --right_fq reads_2.fq.gz \
-O star_fusion_outdir \
--genome_lib_dir /path/to/your/CTAT_resource_lib \
--verbose_level 2
5. 结果输出
#FusionName JunctionReadCount SpanningFragCount SpliceType LeftGene LeftBreakpoint RightGene RightBreakpoint LargeAnchorSupport FFPM LeftBreakDinuc LeftBreakEntropy RightBreakDinuc RightBreakEntropy annots
THRA--AC090627.1 27 93 ONLY_REF_SPLICE THRA^ENSG00000126351.8 chr17:38243106:+ AC090627.1^ENSG00000235300.3 chr17:46371709:+ YES_LDAS 23875.8456 GT 1.8892 AG 1.9656 ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr17:8.12Mb]"]
THRA--AC090627.1 5 93 ONLY_REF_SPLICE THRA^ENSG00000126351.8 chr17:38243106:+ AC090627.1^ENSG00000235300.3 chr17:46384693:+ YES_LDAS 19498.6072 GT 1.8892 AG 1.4295 ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr17:8.12Mb]"]
ACACA--STAC2 12 52 ONLY_REF_SPLICE ACACA^ENSG00000132142.15 chr17:35479453:- STAC2^ENSG00000141750.6 chr17:37374426:- YES_LDAS 12733.7844 GT 1.9656 AG 1.9656 ["ChimerSeq","CCLE","Klijn_CellLines","FA_CancerSupp","INTRACHROMOSOMAL[chr17:1.60Mb]"]
RPS6KB1--SNF8 10 43 ONLY_REF_SPLICE RPS6KB1^ENSG00000108443.9 chr17:57970686:+ SNF8^ENSG00000159210.5 chr17:47021337:- YES_LDAS 10545.1651 GT 1.3753 AG 1.8323 ["Klijn_CellLines","FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr17:10.95Mb]"]
TOB1--SYNRG 8 30 ONLY_REF_SPLICE TOB1^ENSG00000141232.4 chr17:48943419:- SYNRG^ENSG00000006114.11 chr17:35880751:- YES_LDAS 7560.6844 GT 1.4566 AG 1.8892 ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:12.97Mb]"]
VAPB--IKZF3 4 46 ONLY_REF_SPLICE VAPB^ENSG00000124164.11 chr20:56964573:+ IKZF3^ENSG00000161405.12 chr17:37934020:- YES_LDAS 9948.269 GT 1.9656 AG 1.7819 ["FA_CancerSupp","Klijn_CellLines","CCLE","ChimerSeq","ChimerPub","INTERCHROMOSOMAL[chr20--chr17]"]
ZMYND8--CEP250 2 44 ONLY_REF_SPLICE ZMYND8^ENSG00000101040.15 chr20:45852970:- CEP250^ENSG00000126001.11 chr20:34078463:+ NO_LDAS 9152.4075 GT 1.8295 AG 1.8062 ["FA_CancerSupp","CCLE","ChimerSeq","INTRACHROMOSOMAL[chr20:11.74Mb]"]
AHCTF1--NAAA 3 38 ONLY_REF_SPLICE AHCTF1^ENSG00000153207.10 chr1:247094880:- NAAA^ENSG00000138744.10 chr4:76846964:- YES_LDAS 8157.5805 GT 1.7232 AG 1.8062 ["FA_CancerSupp","CCLE","INTERCHROMOSOMAL[chr1--chr4]"]
VAPB--IKZF3 1 46 ONLY_REF_SPLICE VAPB^ENSG00000124164.11 chr20:56964573:+ IKZF3^ENSG00000161405.12 chr17:37922746:- NO_LDAS 9351.3729 GT 1.9656 AG 1.9329 ["FA_CancerSupp","Klijn_CellLines","CCLE","ChimerSeq","ChimerPub","INTERCHROMOSOMAL[chr20--chr17]"]
VAPB--IKZF3 1 46 ONLY_REF_SPLICE VAPB^ENSG00000124164.11 chr20:56964573:+ IKZF3^ENSG00000161405.12 chr17:37944627:- NO_LDAS 9351.3729 GT 1.9656 AG 1.8892 ["FA_CancerSupp","Klijn_CellLines","CCLE","ChimerSeq","ChimerPub","INTERCHROMOSOMAL[chr20--chr17]"]
STX16--RAE1 4 33 ONLY_REF_SPLICE STX16^ENSG00000124222.17 chr20:57227143:+ RAE1^ENSG00000101146.8 chr20:55929088:+ YES_LDAS 7361.719 GT 1.9899 AG 1.9656 ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr20:1.27Mb]"]
AHCTF1--NAAA 1 38 ONLY_REF_SPLICE AHCTF1^ENSG00000153207.10 chr1:247094431:- NAAA^ENSG00000138744.10 chr4:76846964:- NO_LDAS 7759.6498 GT 1.9086 AG 1.8062 ["FA_CancerSupp","CCLE","INTERCHROMOSOMAL[chr1--chr4]"]
STX16-NPEPL1--RAE1 4 24 INCL_NON_REF_SPLICE STX16-NPEPL1^ENSG00000254995.4 chr20:57227143:+ RAE1^ENSG00000101146.8 chr20:55929088:+ YES_LDAS 5571.0306 GT 1.9899 AG 1.9656 INTRACHROMOSOMAL[chr20:1.27Mb]
RAB22A--MYO9B 6 11 ONLY_REF_SPLICE RAB22A^ENSG00000124209.3 chr20:56886178:+ MYO9B^ENSG00000099331.9 chr19:17256207:+ YES_LDAS 3382.4115 GT 1.6895 AG 1.9656 ["FA_CancerSupp","ChimerSeq","CCLE","INTERCHROMOSOMAL[chr20--chr19]"]
MED1--ACSF2 4 11 ONLY_REF_SPLICE MED1^ENSG00000125686.7 chr17:37595418:- ACSF2^ENSG00000167107.8 chr17:48548389:+ YES_LDAS 2984.4807 GT 1.9656 AG 1.9656 ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:10.90Mb]"]
MED13--BCAS3 2 12 ONLY_REF_SPLICE MED13^ENSG00000108510.5 chr17:60129898:- BCAS3^ENSG00000141376.16 chr17:59469338:+ YES_LDAS 2785.5154 GT 1.5546 AG 1.9086 ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:0.55Mb]"]
MED1--STXBP4 1 15 ONLY_REF_SPLICE MED1^ENSG00000125686.7 chr17:37607291:- STXBP4^ENSG00000166263.9 chr17:53218671:+ NO_LDAS 3183.4461 GT 1.3996 AG 1.7968 ["CCLE","FA_CancerSupp","Klijn_CellLines","INTRACHROMOSOMAL[chr17:15.44Mb]"]
MED13--BCAS3 1 12 ONLY_REF_SPLICE MED13^ENSG00000108510.5 chr17:60129898:- BCAS3^ENSG00000141376.16 chr17:59465979:+ NO_LDAS 2586.55 GT 1.5546 AG 0.8366 ["FA_CancerSupp","CCLE","INTRACHROMOSOMAL[chr17:0.55Mb]"]
STARD3--DOK5 2 7 ONLY_REF_SPLICE STARD3^ENSG00000131748.11 chr17:37793484:+ DOK5^ENSG00000101134.7 chr20:53259997:+ NO_LDAS 1790.6885 GT 1.8892 AG 1.9656 ["FA_CancerSupp","CCLE","INTERCHROMOSOMAL[chr17--chr20]"]
DIDO1--TTI1 1 10 ONLY_REF_SPLICE DIDO1^ENSG00000101191.12 chr20:61569148:- TTI1^ENSG00000101407.8 chr20:36642259:- NO_LDAS 2188.6192 GT 1.6402 AG 1.9329 ["FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr20:24.85Mb]"]
DIDO1--TTI1 1 10 ONLY_REF_SPLICE DIDO1^ENSG00000101191.12 chr20:61569148:- TTI1^ENSG00000101407.8 chr20:36634799:- NO_LDAS 2188.6192 GT 1.6402 AG 1.8892 ["FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr20:24.85Mb]"]
BRD4--RFX1 1 8 ONLY_REF_SPLICE BRD4^ENSG00000141867.13 chr19:15443101:- RFX1^ENSG00000132005.4 chr19:14109129:- NO_LDAS 1790.6884 GT 1.9086 AG 1.8892 ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr19:1.23Mb]"]
BRD4--RFX1 1 8 ONLY_REF_SPLICE BRD4^ENSG00000141867.13 chr19:15443101:- RFX1^ENSG00000132005.4 chr19:14094407:- NO_LDAS 1790.6884 GT 1.9086 AG 1.8295 ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr19:1.23Mb]"]
TRPC4AP--MRPL45 1 8 ONLY_REF_SPLICE TRPC4AP^ENSG00000100991.7 chr20:33665849:- MRPL45^ENSG00000174100.5 chr17:36478009:+ NO_LDAS 1790.6884 GT 1.6895 AG 1.9086 ["CCLE","Klijn_CellLines","FA_CancerSupp","INTERCHROMOSOMAL[chr20--chr17]"]
三、 可视化
基于STAR-Fusion结果,通过配套的FusionInspector进行过滤及整合;
FusionInspector --fusions Sam_fusionList.txt \
-O Sample --genome_lib genome/genome \
--left_fq clean/Sam.R1.fq.gz --right_fq clean/Sam.R2.fq.gz \
--out_prefix finspector test --vis #输出可视化文件
获得输出文件:
finspector.fa : the candidate fusion-gene contigs
finspector.bed : the reference gene structure annotations for fusion partners
finspector.junction_reads.bam : alignments of the breakpoint-junction supporting reads.
finspector.spanning_reads.bam : alignments of the breakpoint-spanning paired-end reads.
1、 IGV
IGV的使用网上有很多说明,有机会再给大家分享,IGV 输入文件包括:
如下官网说明:
finspector.fa : the candidate fusion-gene contigs
finspector.bed : the reference gene structure annotations for fusion partners
finspector.junction_reads.bam : alignments of the breakpoint-junction supporting reads.
finspector.spanning_reads.bam : alignments of the breakpoint-spanning paired-end reads.
IGV 显示出来结果还是很特别的效果,但是可以看到是两个基因的融合,如下:
2、 chimeraviz
Bioconductor 包 chimeraviz 嵌合RNA可视化,一个自动整合RNA数据和已知基因组特征的可视化框架对于结果的检验是有帮助的。
2017年发布的一个 bioconductor 包,chimeraviz 就可以做到自动创建嵌合RNA 可视化。官网教程,直接在 bioconductor 可以看到详细说明,下载安装好该R包后,自带一系列的融合基因可视化的测试数据。
https://bioconductor.org/packages/release/bioc/html/chimeraviz.html | HTML | R Script |
可以看到,所支持的9种融合基因检测工具的示例结果都在这里了,比如我最喜欢的 star-fusion 的结果节选如下:
[1] "5267readsAligned.bam"
[2] "5267readsAligned.bam.bai"
[3] "aeron_fusion_support.txt"
[4] "aeron_fusion_transcripts.fa"
[5] "chimericJunctions_MCF-7.txt"
[6] "defuse_833ke_results.filtered.tsv"
[7] "ericscript_SRR1657556.results.total.tsv"
[8] "fusion5267and11759reads.bam"
[9] "fusion5267and11759reads.bam.bai"
[10] "fusion5267and11759reads.bedGraph"
[11] "fusioncatcher_833ke_final-list-candidate-fusion-genes.txt"
[12] "FusionMap_01_TestDataset_InputFastq.FusionReport.txt"
[13] "Homo_sapiens.GRCh37.74.sqlite"
[14] "Homo_sapiens.GRCh37.74_subset.gtf"
[15] "infusion_fusions.txt"
[16] "jaffa_results.csv"
[17] "oncofuse.outfile"
[18] "PRADA.acc.fusion.fq.TAF.tsv"
[19] "protein_domains_5267.bed"
[20] "reads.1.fq"
[21] "reads.2.fq"
[22] "reads_supporting_defuse_fusion_5267.1.fq"
[23] "reads_supporting_defuse_fusion_5267.2.fq"
[24] "soapfuse_833ke_final.Fusion.specific.for.genes"
[25] "squid_hcc1954_sv.txt"
[26] "star-fusion.fusion_candidates.final.abridged.txt"
[27] "star-fusion.fusion_predictions.abridged.annotated.coding_effect.tsv"
[28] "UCSC.HG19.Human.CytoBandIdeogram.txt"
[29] "UCSC.HG38.Human.CytoBandIdeogram.txt"
[30] "UCSC.MM10.Mus.musculus.CytoBandIdeogram.txt"
3、 Circos
经典的Circos,可以清晰的展示染色体间和染色体内的基因的融合,我们这里同样使用 star-fusion 软件的结果,如下:
starFusion <- system.file(
"extdata",
"star-fusion.fusion_candidates.final.abridged.txt",
package = "chimeraviz")
fusions <- import_starfusion(starFusion, "hg38", 10)
plot_circle(fusions)
红色条带-染色体内融合,蓝色条带-染色体间融合,如下:
融合基因绘制
软件包 chimeraviz 自带绘图函数 plot_fusion,可以多角度的实现绘制融合基因,如下:
defuse833ke <- system.file("extdata", "defuse_833ke_results.filtered.tsv", package = "chimeraviz")
fusion5267and11759reads <- system.file("extdata", "fusion5267and11759reads.bam", package = "chimeraviz")
fusions <- import_defuse(defuse833ke, "hg38")
fusion <- get_fusion_by_gene_name(fusions,"RCC1")
fusion <- get_fusion_by_id(fusions, 5267)
edbSqliteFile <- system.file("extdata", "Homo_sapiens.GRCh37.74.sqlite", package="chimeraviz")
count <- system.file("extdata","fusion5267and11759reads.bedGraph", package="chimeraviz")
edb <- ensembldb::EnsDb(edbSqliteFile)
plot_fusion(fusion,#bamfile = fusion5267and11759reads ,
edb = edb,non_ucsc = T,
reduce_transcripts = T,bedgraphfile = count)
我们看到每种展示方式各有利弊,
1.IGV+FusionInspector:步骤繁琐,文件冗余较多;展示结果清晰明了;
2.chimeraviz :自定义创建数据库文件有限制;支持多种融合分析内容输入,结果可视化类型丰富;
3.Circos:文件整理和conf比较繁琐;可视化结果自定义化程度高,较为美观。
利用了两期内容来阐述基于 RNA-seq 检测融合基因以及 SCI 文章中的一般展示方法,所以转录组的数据出来表达大家很熟悉之外,融合基因的分析也可以考虑分析一下,也许有意外的收获!
关注公众号桓峰基因,每日更新,扫码进群交流不停歇,马上就出视频版,关注我,您最佳的选择!
References:
Haas B J , Dobin A , Li B , et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods[J]. Genome Biology, 2019, 20(1):1-16.
Lågstad S, Zhao S, Hoff AM, Johannessen B, Lingjærde OC, Skotheim RI. chimeraviz: a tool for visualizing chimeric RNA. Bioinformatics. 2017;33(18):2954-2956.
本文使用 文章同步助手 同步