生信笔记15-TCGA数据下载及生存分析

2023-03-28 本文已影响0人江湾青年

这里以食管癌（Esophageal carcinoma，ESCA）为例

首先使用R包TCGAbiolinks下载ESCA的数据

library(TCGAbiolinks)
query.esca <- GDCquery(project = "TCGA-ESCA", 
                       data.category = "Transcriptome Profiling", 
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "STAR - Counts")
GDCdownload(query.esca)

上述代码运行完毕后，会在你的当前路径下创建一个GDCdata文件夹，然后并会自动连接TCGA网站进行数据的下载

# 合并所有样本
esca <- GDCprepare(query.esca)
# define ESCC and ESAD
table(esca$primary_diagnosis)
esca$tumor_type <- factor(esca$primary_diagnosis,
                          levels = c('Adenocarcinoma, NOS','Basaloid squamous cell carcinoma','Mucinous adenocarcinoma',
                          'Squamous cell carcinoma, keratinizing, NOS','Squamous cell carcinoma, NOS','Tubular adenocarcinoma'),
                          labels = c('ESAD','ESCC','ESAD','ESCC','ESCC','ESAD')) %>% as.character()
table(esca$tumor_type)

提取ESCC数据的TPM表达矩阵，并用TCGAbiolinks包自带的TCGAanalyze_survival()函数进行生存分析

# 提取ESCC数据
escc <- esca[,which(esca$tumor_type == 'ESCC')]
# 提取TPM表达矩阵
tpm <- escc@assays@data$tpm_unstrand
dimnames(tpm) <- list(escc@rowRanges$gene_name,escc@colData@rownames)
# 按gender进行生存分析
TCGAanalyze_survival(esca@colData,clusterCol = 'gender')
# 按CREBBP基因表达高低进行生存分析
escc$CREBBP_exp <- 'CREBBP_high'
escc$CREBBP_exp[which(tpm['CREBBP',] < median(tpm['CREBBP',]))] <- 'CREBBP_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'CREBBP_exp')
# SIRT7
escc$SIRT7_exp <- 'SIRT7_high'
escc$SIRT7_exp[which(tpm['SIRT7',] < median(tpm['SIRT7',]))] <- 'SIRT7_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'SIRT7_exp')

参考

https://mp.weixin.qq.com/s?__biz=MzA5ODQ1NDIyMQ==&mid=2649712070&idx=3&sn=742f798f93155718eb6a79dcb82d6ca6&chksm=888a9a64bffd1372c5b74f5f08e608d8bcc9375a50a422ca9aab3225beaeec9543a3ec54b07d&scene=27

拓展阅读：https://www.jianshu.com/p/fd5e06ec260b
数据手动下载方法：https://zhuanlan.zhihu.com/p/563936447

生信笔记15-TCGA数据下载及生存分析

参考

猜你喜欢

热点阅读