GISTIC2安装与使用及分析TCGA拷贝数变异
2022-07-12 本文已影响0人
生信开荒牛
GISTIC2软件是分析CNV的工具。
下载
下载地址:ftp://ftp.broadinstitute.org/pub/GISTIC2.0/

wget ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTIC_2_0_23.tar.gz
解压
tar zxvf GISTIC_2_0_23.tar.gz
安装MCR_Installer

GISTIC2软件是一个MATLAB程序,在Linux环境下运行需要MCR_Installer。
注意安装过程需要JAVA环境。
conda activate rna #我以前的rna环境下有JAVA
java -version #查看JAVA版本
cd MCR_Installer
unzip MCRInstaller.zip
chmod 744 installer_input.txt #修改权限
./install -mode silent -agreeToLicense yes -destinationFolder /mnt/d/bioinfo/biosoft/gistic2/MATLAB_Compiler_Runtime
安装成功

运行GISTIC2
首先需要修改GISTIC2这个命令的一些参数。

由于我的是windows系统,直接用Notepad++打开进行修改,主要是修改MCR_ROOT这个变量和调用gp_gistic2_from_seg的全路径

运行示例代码 run_gistic_example:
./run_gistic_example
出现如下报错

缺少libncurses.so.5
解决方法:libncurses官网下载libncurses.so.5
wget http://ftp.br.debian.org/debian/pool/main/n/ncurses/libncurses5_6.2+20201114-2_amd64.deb
sudo apt-get install libncurses5
##运行示例代码
./run_gistic_example
结果在example_results文件夹中

安装好软件,下面开始分析TCGA数据分析,选了样本较少的CHOL
一、用R语言下载TCGA Segment_data数据
rm(list = ls())
options(stringsAsFactors = F)
#BiocManager::install("dplyr")
library(dplyr)
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-CHOL",
data.category = "Copy Number Variation",
data.type = "Masked Copy Number Segment")
GDCdownload(query, method = "api", files.per.chunk = 100)
segment_dat <- GDCprepare(query = query)
segment_dat$Sample <- substring(segment_dat$Sample,1,16)
segment_dat <- grep("01A$",segment_dat$Sample) %>%
segment_dat[.,]
segment_dat[,1] <- segment_dat$Sample
segment_dat <- segment_dat[,-7]
write.table(segment_dat,"MaskedCopyNumberSegment.txt",sep="\t",
quote = F,col.names = F,row.names = F)
二、准备Markers File
下载地址:https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files

读入R
##Markers File
snp<-read.table('snp6.na35.remap.hg38.subset.txt',header=T)
snp<-snp[snp$freqcnv=='FALSE',]
snp<-snp[,1:3]
colnames(snp)<-c("Marker name","chromosome","Marker position")
write.table(snp,'Marker.txt',sep="\t",
quote = F,col.names = F,row.names = F)
准备好"MaskedCopyNumberSegment.txt"和'Marker.txt'这两个文件后就可以进行GISTIC分析了。
三、运行GISTIC2
按照示例代码改basedir、segfile、markersfile路径就可以了。
## output directory
echo --- creating output directory ---
basedir=/mnt/d/bioinfo/biosoft/gistic2/chol
mkdir -p $basedir
echo --- running GISTIC ---
segfile=/mnt/d/bioinfo/biosoft/gistic2/input/MaskedCopyNumberSegment.txt
markersfile=/mnt/d/bioinfo/biosoft/gistic2/input/Marker.txt
refgenefile=/mnt/d/bioinfo/biosoft/gistic2/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat
./gistic2 -b $basedir -seg $segfile -mk $markersfile -refgene $refgenefile -genegistic 1 -smallmem 1 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme
四、使用maftools进行结果可视化
主要用到四个件:
"all_lesions.conf_90.txt","amp_genes.conf_90.txt","del_genes.conf_90.txt","scores.gistic"
library(maftools)
chol.gistic <- readGistic(gisticAllLesionsFile="all_lesions.conf_90.txt", gisticAmpGenesFile="amp_genes.conf_90.txt", gisticDelGenesFile="del_genes.conf_90.txt", gisticScoresFile="scores.gistic", isTCGA=TRUE)
染色体突变图
gisticChromPlot(gistic=chol.gistic, markBands="all")

气泡图
gisticBubblePlot(gistic=chol.gistic)

用oncoplot展示
gisticOncoPlot(gistic = chol.gistic, sortByAnnotation = TRUE, top =10)

参考:
http://www.360doc.com/content/21/0714/12/76149697_986501287.shtml
https://www.jianshu.com/p/508462937ca7