生信分析流程生物信息学R语言源码小白的数据分析学习

生存分析-黑、白、许多灰

2019-06-09  本文已影响85人  Juan_NF

黑白许多灰,是亦舒的一本书,感觉用在这里还挺合适的;
是我们看到的世界,也是我们接触到的统计模型;
人狠话不多,嘴还不碎送给学统计的一干人等;
总之,即使许多灰,还是尽量做到透彻吧;

1.TCGA数据库 patient_barcode--那些需要了解的“暗号”

编码 代表内容 缩写
1 Primary Solid Tumor TP
2 Recurrent Solid Tumor TR
3 Primary Blood Derived Cancer - Peripheral Blood TB
4 Recurrent Blood Derived Cancer - Bone Marrow TRBM
5 Additional - New Primary TAP
6 Metastatic TM
7 Additional Metastatic TAM
8 Human Tumor Original Cells THOC
9 Primary Blood Derived Cancer - Bone Marrow TBM
10 Blood Derived Normal NB
11 Solid Tissue Normal NT
12 Buccal Cell Normal NBC
13 EBV Immortalized Normal NEBV
14 Bone Marrow Normal NBM
15 sample type 15 15SH
16 sample type 16 16SH
20 Control Analyte CELLC
40 Recurrent Blood Derived Cancer - Peripheral Blood TRB
50 Cell Lines CELL
60 Primary Xenograft Tissue XP
61 Cell Line Derived Xenograft Tissue XCL
99 sample type 99 99SH

2.TCGA数据下载

library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-BRCA",
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts",
                  barcode = clin[1:500,1])
GDCdownload(query)
BRC_DATA1<-GDCprepare(query,save=FALSE)
library(SummarizedExperiment)
BRC_DATA1_1 <- assay(BRC_DATA1)
library(TCGAbiolinks)
clinic <- GDCquery_clinic(project = "TCGA-BRCA",
                          type = 'Clinical')

TCGA中TNBC的数据是基于Her2、ER、PR的IHC结果来进行划分的,相应的表型数据是这样获得的:内容整理自TNBCsub

1.点击链接[GDC Data],选择界面右下角Legacy Archive(https://portal.gdc.cancer.gov/)


2.Primary Site和Project选择;

3.Files中,Data Type选择Clinical data,Data Format选择Biotab,右侧文件选择nationwidechildrens.org_clinical_patient_brca.txt

3.KMplot

a.看下这个图是用了什么数据?
b.看下这些数据是怎么来的?
c.如何区分是否有差异?log-rank test

library(survival)
library(survminer)
fit.surv1 <-Surv(tumor_clin$time,as.numeric(tumor_clin$Status))
fit<-survfit(fit.surv1~group,data=voom_group)
summary(fit)
ggsurvplot(fit)

tumor_clin$time生存分析中的时间;
tumor_clin$Status生存分析中的Alive-0和Dead-1;
group对应的数据中的分组信息的列的列名;
summary(fit)查看数据

clin_group<- tumor_clin[,c('Status','time')]
group <- voom_group[,1]
#####取出对应分组的临床信息
group_1 <- clin_group[group==1,]
group_2 <- clin_group[group==2,]
#####取出对应分组的对应Status(Alive|Death)信息
group_1_A<- group_1[group_1$Status==0,]
group_1_D<- group_1[group_1$Status==1,]
group_2_A<- group_2[group_2$Status==0,]
group_2_D<- group_2[group_2$Status==1,]
#####对1组下的Status信息按照time进行排序
A_1<- group_1_A[order(group_1_A$time),]
D_1<- group_1_D[order(group_1_D$time),]
######将相同time的event的计数进行加和
D_1<- data.frame(time = as.numeric(D_1$time[!duplicated(D_1$time)]),
               event= as.numeric(by(D_1,D_1$time, function(x){sum(x[,1])})))
#####将相同time的censor(alive)的计数进行加和
A_1<- data.frame(time = as.numeric(A_1$time[!duplicated(A_1$time)]),
                 censor= as.numeric(by(A_1,A_1$time, function(x){length(x[,1])})))
#####这一组的总人数-event(death)-censor(alive|依据时间排序后,`此death对应时间`之前的alive)
y<- function(x){545-sum(D_1$event[D_1$time<x[1]])-sum(A_1$censor[A_1$time<x[1]])}
D_1$n.risk<- apply(D_1,1,y)
#####每一个时间点的survival=(risk(number of alive)-event)/risk(number of alive)
D_1$step.survival <- apply(D_1,1,function(x){(x[3]-x[2])/x[3]})
#####截止到此时间点的survival=此时间点之前的survival*此时间点的survival
D_1$surv <- apply(D_1,1,function(x){prod(D_1$step.survival[D_1$time<=x[1]])})
> D_1
   time event n.risk step.survival      surv
1    26     1    519     0.9980732 0.9980732
2   116     1    501     0.9980040 0.9960811
3   158     1    500     0.9980000 0.9940889
4   160     1    499     0.9979960 0.9920967
5   172     1    496     0.9979839 0.9900965
6   174     1    494     0.9979757 0.9880923
7   302     1    480     0.9979167 0.9860338
8   320     1    473     0.9978858 0.9839491
9   322     1    471     0.9978769 0.9818601
10  336     1    467     0.9978587 0.9797576
11  348     1    463     0.9978402 0.9776415
12  365     1    458     0.9978166 0.9755069
13  377     1    447     0.9977629 0.9733245
14  385     2    441     0.9954649 0.9689104
15  426     1    423     0.9976359 0.9666198
16  446     1    412     0.9975728 0.9642736
17  524     1    372     0.9973118 0.9616815
18  538     1    366     0.9972678 0.9590540
19  558     1    355     0.9971831 0.9563524
20  571     1    351     0.9971510 0.9536278
21  584     1    344     0.9970930 0.9508556
22  612     1    332     0.9969880 0.9479916
23  614     1    331     0.9969789 0.9451275
#####未展示所有,画图用的是time和surv这两列

4.log-rank检验

fit.surv1 <-Surv(tumor_clin$time,as.numeric(tumor_clin$Status))
log_rank_p <- apply(tumor_voom, 1, function(values1){
  group=ifelse(values1>median(values1),2,1)
  data.survdiff=survdiff(fit.surv1~group)
  p.val = 1 - pchisq(data.survdiff$chisq, length(data.survdiff$n) - 1)
  return( p.val)
})
> data.survdiff
Call:
survdiff(formula = fit.surv1 ~ group, data = voom_group)

          N Observed Expected (O-E)^2/E (O-E)^2/V
group=1 545       79     72.1     0.656      1.25
group=2 545       73     79.9     0.592      1.25

 Chisq= 1.3  on 1 degrees of freedom, p= 0.3 

5.coxph(Cox proportional hazards)

fit.surv1 <-Surv(tumor_clin$time,as.numeric(tumor_clin$Status))
cox_p <- apply(tumor_voom, 1, function(values1){
  group=ifelse(values1>median(values1),2,1)
  fit <- coxph(fit.surv1~group)
  fit1 <- exp(fit$coefficients)
  p.val <- summary(fit)$logtest["pvalue"]
  result <- list(fit1,p.val)
  return(result)
})
> summary(coxph)
Call:
coxph(formula = fit.surv1 ~ group, data = voom_group)

  n= 1090, number of events= 152 

         coef exp(coef) se(coef)      z Pr(>|z|)
group -0.1818    0.8338   0.1627 -1.117    0.264

      exp(coef) exp(-coef) lower .95 upper .95
group    0.8338      1.199    0.6061     1.147

Concordance= 0.546  (se = 0.024 )
Likelihood ratio test= 1.25  on 1 df,   p=0.3
Wald test            = 1.25  on 1 df,   p=0.3
Score (logrank) test = 1.25  on 1 df,   p=0.3

HR = 1: 无风险
HR < 1: 风险降低
HR > 1: 风险增加

【参考内容】
1.各种格式TCGA数据下载
2.cox-survival
3.survival rate计算


课程分享
生信技能树全球公益巡讲
https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
B站公益74小时生信工程师教学视频合辑
https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
招学徒:
https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw

上一篇下一篇

猜你喜欢

热点阅读