
Topic 12 临床预测模型—列线表 (Nomogram)

3. 预测概率:例如图中的5-year survival prob,表示5年的生存概率。




我们看下 IF 10+文章中的列线表的角色和地位,找一篇临床医学的高分文章看看,就这篇吧,如下:

列下图文中的位置,Figure ,如下:

Cox 回归模型

Cox 回归在 survival 和 rms这两个包中都可以实现,因此我们两个函数都进行尝试一下,比较一下两个函数之间的区别。加载survival 和 rms 程序包,如下:

if (!require(survival)) {
if (!require("rms")) {

1. 数据读取

我们仍然采用软件包自带的肺癌数据库 NCCTG Lung Cancer Data作为输入数据,如下:


Survival in patients with advanced lung cancer from the North CentralCancer Treatment Group. Performance scores rate how well the patientcan perform usual daily activities.

data(package = "survival")
# 2. 打包数据
lung$sex = factor(lung$sex)
dd <- datadist(lung)
options(datadist = "dd")
## $limits
## inst time status age sex ph.ecog ph.karno pat.karno
## Low:effect 3 166.75 1 56 <NA> 0 75 70
## Adjust to 11 255.50 1 63 1 1 80 80
## High:effect 16 396.50 2 69 <NA> 1 90 90
## Low:prediction 1 31.00 1 44 1 0 60 60
## High:prediction 26 740.00 2 76 2 2 100 100
## Low 1 5.00 1 39 1 0 50 30
## High 33 1022.00 2 82 2 3 100 100
## meal.cal wt.loss
## Low:effect 635.0000 0.00000
## Adjust to 975.0000 7.00000
## High:effect 1150.0000 15.75000
## Low:prediction 312.4361 -5.00000
## High:prediction 1500.0000 35.23348
## Low 96.0000 -24.00000
## High 2600.0000 68.00000
## $values
## $values$status
## [1] 1 2
## $values$sex
## [1] "1" "2"
## $values$ph.ecog
## [1] 0 1 2 3
## $values$ph.karno
## [1] 50 60 70 80 90 100
## $values$pat.karno
## [1] 30 40 50 60 70 80 90 100

2. cph {rms}

使用rms 程序包中的 cph 函数构造Cox回归模型,其中的几个变量需要根据之前做Cox回归模型时显著的那几个变量,然后做Cox回归,我们发现sex 和 ph.ecog 两个变量显著性最高,如下:

cph <- cph(Surv(time, status) ~ age + sex + ph.ecog + ph.karno, data = lung, x = TRUE,
y = TRUE, surv = TRUE)
## Frequencies of Missing Values Due to Each Variable
## Surv(time, status) age sex ph.ecog
## 0 0 0 1
## ph.karno
## 1
## Cox Proportional Hazards Model
## cph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno,
## data = lung, x = TRUE, y = TRUE, surv = TRUE)
## Model Tests Discrimination
## Indexes
## Obs 226 LR chi2 31.27 R2 0.129
## Events 163 d.f. 4 Dxy 0.263
## Center 2.2049 Pr(> chi2) 0.0000 g 0.550
## Score chi2 31.06 gr 1.732
## Pr(> chi2) 0.0000
## Coef S.E. Wald Z Pr(>|Z|)
## age 0.0129 0.0094 1.37 0.1712
## sex=2 -0.5726 0.1692 -3.38 0.0007
## ph.ecog 0.6329 0.1760 3.60 0.0003
## ph.karno 0.0126 0.0095 1.32 0.1870

3. coxph {survival}

使用survival程序包中的 coxph 函数构造 Cox回归模型,选择同样的几个变量,然后做Cox回归,同样,我们发现 sex 和ph.ecog 两个变量显著性最高,但是这个函数会给出一个对整体模型的评估(p=2.695e-06),如下:

coxph <- coxph(Surv(time, status) ~ age + sex + ph.ecog + ph.karno, data = lung)
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno,
## data = lung)
## coef exp(coef) se(coef) z p
## age 0.012868 1.012951 0.009404 1.368 0.171226
## sex2 -0.572802 0.563943 0.169222 -3.385 0.000712
## ph.ecog 0.633077 1.883397 0.176034 3.596 0.000323
## ph.karno 0.012558 1.012637 0.009514 1.320 0.186842
## Likelihood ratio test=31.27 on 4 df, p=2.695e-06
## n= 226, number of events= 163
## (因为不存在,2个观察量被删除了)


1. nomogram {rms}

这个rms 程序包里面的nomogram函数读入的数据需要是rms构建的回归模型,当然这个包里面的模型构建函数还是很全的,基本上可以满足需求,如下描述:

fit a regression model fit that was created with rms, and (usually)with options(datadist = "object.name") in effect.

根据 rms 包中函数cph获得的回归模型,绘制列线表,这里选择1年和2年的风险估计,


# 绘制列线图\t\t\t
survival <- Survival(cph)
survival1 <- function(x) survival(365, x)
survival2 <- function(x) survival(730, x)
nom <- nomogram(cph, fun = list(survival1, survival2), fun.at = c(0.05, seq(0.1,
0.9, by = 0.05), 0.95), funlabel = c("1 year survival", "2 year survival"))

2. regplot {regplot}


Creates a nomogram representation of a fitted regression. Theregression object reg can be of different types from the stats,survival , rms, MASS and lme4 libraries. Specifically models generatedby the commands: glm, Glm, lm, ols, lrm, survreg, psm, coxph, cph,glm.nb, polr or mixed model regressions lmer, glmer, and glmer.nb. Forglm, Glm and glmer the supported family/link pairings are:gaussian/identity, binomial/logit, quasibinomial/logit, poisson/logand quasipoisson/log. For ordinal regression, using polr, logit andprobit models are supported. For survreg and psm the distribution maybe lognormal, gaussian, weibull, exponential or loglogistic. Forglm.nb (from package MASS) and glmer.nb only log-link is allowed.


if (!require(regplot)) {

此时我们选择 coxph 构造的回归模型,该函数同时输出每个变量对应的得分points, 最后我们来绘制列线表,如下:

observation=lung[6,], #也可以不展示
plots=c("density","no plot"),
failtime = c(365,730),
prfail = TRUE, #cox回归中需要TRUE
showP = T, #是否展示统计学差异
#droplines = F,#观测2示例计分是否画线
# colors = mycol, #用前面自己定义的颜色
rank="range", #根据统计学差异的显著性进行变量的排序
title="Cox regression"
) #展示观测的可信区间
## [[1]]
## ph.karno Points
## 1 50 10
## 2 70 24
## 3 90 37
## [[2]]
## ph.ecog Points
## 1 0.0 0
## 2 0.5 17
## 3 1.0 33
## 4 1.5 50
## 5 2.0 67
## 6 2.5 83
## 7 3.0 100
## [[3]]
## sex Points
## sex1 1 32
## sex2 2 1
## [[4]]
## age Points
## 1 35 13
## 2 45 20
## 3 55 27
## 4 65 33
## 5 75 40
## 6 85 47
## [[5]]
## Total Points Pr( time < 365 )
## 1 40 0.1913
## 2 60 0.2669
## 3 80 0.3648
## 4 100 0.4850
## 5 120 0.6210
## 6 140 0.7579
## 7 160 0.8743
## 8 180 0.9518
## 9 200 0.9881
## 10 220 0.9985



再看我们选择了第六个患者,预测其罹患肺癌的风险,患者信息,模型中涉及到的变量为四个,分别为sex, age, ph.ecog, ph.karno,regplot函数输出了每个变量的points,我们自己同样可以清晰的计算出来,如下:

lung[6, ]
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 6 12 1022 1 74 1 1 50 80 513 0



