肿瘤突变分析神器-maftools包
得益于曾老师的介绍引导,了解此包。了解一个包,先看包的说明书,包的用法都在里面。
maftools包说明书
1.安装包,加载包
source("http://bioconductor.org/biocLite.R")
biocLite("maftools")
library(maftools)
安装包时可能会提示缺少一些包,按照提示安装一下即可。
2.读取MAF文件
maf文件使用的是TCGA-KIRC,下载方法很多
options(stringsAsFactors = F)
laml = read.maf(maf = 'GDC/TCGA.KIRC.mutect.somatic.maf.gz')
3.概览maf文件
plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
KIRC概览
missense_mutation:错义突变
frame_shift_del:移码缺失突变
nonsense_mutation:无义突变
frame_shift_ins:移码插入突变
splice_site:剪接位点
in_frame_ins:框内插入
in_frame_del:框内缺失
translation_start_site:转录起始位点
nonstop_mutation:终止密码子突变
4.绘制瀑布图
oncoplot(maf = laml, top = 30, fontSize = 12 ,showTumorSampleBarcodes = F )
oncoplot_top30_TCGA_KIRC.png
5.绘制箱线图
箱线图,显示六种不同转换的总体分布,并作为堆积条形图显示每个样本中的转换比例
laml.titv = titv(maf = laml, plot = FALSE, useSyn = TRUE)
plotTiTv(res = laml.titv)
箱线图
6.分析相互关系图
somaticInteractions(maf = laml, top = 25, pvalue = c(0.05, 0.1))
Rplot02.png
7.变异特征
第一步从变异矩阵,获得变异碱基周围临近的碱基,比对的是hg38,官网的例子是hg19
laml.tnm = trinucleotideMatrix(maf = laml, ref_genome = 'G:/ref/hg38/hg38.fa', add = TRUE, useSyn = TRUE)
reading G:/ref/hg38/hg38.fa (this might take few minutes)..
#Extracting 5' and 3' adjacent bases..
#Extracting +/- 20bp around mutated bases for background C>T estimation..
#Estimating APOBEC enrichment scores..
#Performing one-way Fisher's test for APOBEC enrichment..
#APOBEC related mutations are enriched in 4.167% of samples (APOBEC enrichment score > 2 ; 14 of 336 samples)
#Creating mutation matrix..
#matrix of dimension 336x96
可视化APOBEC富集与非富集样本的差异
plotApobecDiff(tnm = laml.tnm, maf = laml)
$results
Hugo_Symbol Enriched nonEnriched pval or ci.up ci.low adjPval
1: CD163L1 2 0 0.001616915 Inf 4.503567 Inf 1
2: CCDC54 2 1 0.004734561 50.87663 2.491826 3080.923605 1
3: CPXM1 2 1 0.004734561 50.87663 2.491826 3080.923605 1
4: DHRS7C 2 1 0.004734561 50.87663 2.491826 3080.923605 1
5: OPA1 2 1 0.004734561 50.87663 2.491826 3080.923605 1
3934: ZSWIM8 0 5 1.000000000 0.00000 0.000000 26.888784 1
3935: ZUFSP 0 3 1.000000000 0.00000 0.000000 58.517895 1
3936: ZYG11B 0 3 1.000000000 0.00000 0.000000 58.517895 1
3937: AKAP9 0 11 1.000000000 0.00000 0.000000 9.914261 1
3938: XIRP2 0 11 1.000000000 0.00000 0.000000 9.914261 1
$SampleSummary
Cohort SampleSize Mean Median
1: Enriched 14 37.786 34.0
2: nonEnriched 322 55.220 47.5
Rplot.png
特征分析
library(NMF)
Rplot03.png
laml.sign = extractSignatures(mat = laml.tnm, nTry = 6, plotBestFitRes = FALSE)
Estimating best rank..
Error in (function (...) : All the runs produced an error:
-#1 [r=2] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#2 [r=3] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#3 [r=4] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#4 [r=5] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#5 [r=6] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
In addition: Warning messages:
1: In str_c(names(x), "=") : restarting interrupted promise evaluation
2: In str_c(names(x), "=") : restarting interrupted promise evaluation
3: In str_c(names(x), "=") : restarting interrupted promise evaluation
4: In str_c(names(x), "=") : restarting interrupted promise evaluation
5: In str_c(names(x), "=") : restarting interrupted promise evaluation
出现了报错
/stringi/R/stringi.rdb': No such file or directory,缺少这个文件
试着重新安装这个包
install.packages("stringi")
重新运行
laml.sign = extractSignatures(mat = laml.tnm, nTry = 6, plotBestFitRes = FALSE)
Estimating best rank..
method seed rng metric rank sparseness.basis sparseness.coef rss evar silhouette.coef silhouette.basis
1: brunet random 4 KL 2 0.3294052 0.1553251 23710.81 0.5991749 1.0000000 1.0000000
2: brunet random 2 KL 3 0.3255095 0.2536579 23227.82 0.6073398 0.6526353 0.7024022
3: brunet random 1 KL 4 0.3956116 0.3326660 21245.44 0.6408514 0.4571158 0.5157059
4: brunet random 2 KL 5 0.4523101 0.3241846 20710.76 0.6498899 0.3969265 0.5686192
5: brunet random 3 KL 6 0.4780410 0.3445228 20325.10 0.6564095 0.3227373 0.4817975
residuals niter cpu cpu.all nrun cophenetic dispersion silhouette.consensus
1: 15156.92 1760 NA NA 10 0.9797910 0.8606186 0.9152344
2: 14753.96 2000 NA NA 10 0.7632717 0.3654457 0.3744583
3: 14369.35 2000 NA NA 10 0.7531582 0.4183581 0.3061800
4: 14032.47 2000 NA NA 10 0.7078448 0.4989598 0.2487093
5: 13708.91 2000 NA NA 10 0.6732816 0.5472385 0.2090123
Using 3 as a best-fit rank based on decreasing cophenetic correlation coefficient.
Comparing against experimentally validated 30 signatures.. (See http://cancer.sanger.ac.uk/cosmic/signatures for details.)
Found Signature_1 most similar to validated Signature_5. Aetiology: Unknown [cosine-similarity: 0.757]
Found Signature_2 most similar to validated Signature_3. Aetiology: defects in DNA-DSB repair by HR [cosine-similarity: 0.848]
Found Signature_3 most similar to validated Signature_1. Aetiology: spontaneous deamination of 5-methylcytosine [cosine-similarity: 0.876]
Rplot04.png