R语言机器学习与临床预测模型68--绘制带置信区间的预测模型曲线

2022-07-13  本文已影响0人  科研私家菜

R小盐准备介绍R语言机器学习与预测模型的学习笔记, 快来收藏关注【科研私家菜】


01 预测模型结果可视化

在rms工具包中,使用Predict()函数预测的模型结果可以直接使用plot()或ggplot()函数进行绘制,并自带置信区间,但是建模函数必须使用rms工具包中的相关函数。
对线性模型而言,基础包stats中的函数是lm(),而rms工具包中是ols()函数。


ols(formula, data=environment(formula),
    weights, subset, na.action=na.delete, 
    method="qr", model=FALSE,
    x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
    penalty=0, penalty.matrix, tol=1e-7, sigma,
    var.penalty=c('simple','sandwich'), ...)

library(rms)
model.1 <- lm(mpg ~ wt, data = mtcars)
model.2 <- ols(mpg ~ wt, data = mtcars)
 
summary(model.1)


print(model.2)

02 stats包的使用

在stats包中,与模型预测相关的有两个函数:fitted()和predict()。前者是根据模型结果对原数据中的样本的响应变量(因变量)进行预测:

head(fitted(model.1))
predict()函数则可以对新数据的样本进行预测:

## S3 method for class 'lm'
predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
        interval = c("none", "confidence", "prediction"),
        level = 0.95, type = c("response", "terms"),
        terms = NULL, na.action = na.pass,
        pred.var = res.var/weights, weights = 1, ...)

object:模型对象;

newdata:新数据;必须为数据框结构;若省略,则默认为原数据;

interval:区间类型,有两种类型:置信区间(confidence)和预测区间(prediction);

level:置信水平;默认为95%。

predict(model.1, data.frame(wt = seq(1,6,0.5)),
        interval = "prediction")
predict(model.1, data.frame(wt = seq(1,6,0.5)),
        interval = "confidence")

而在rms工具包中,模型预测对应的是Predict()函数(首字母大写):

Predict(object, ..., fun=NULL, funint=TRUE,
        type = c("predictions", "model.frame", "x"),
        np = 200, conf.int = 0.95,
        conf.type = c("mean", "individual","simultaneous"),
        usebootcoef=TRUE, boot.type=c("percentile", "bca", "basic"),
        posterior.summary=c('mean', 'median', 'mode'),
        adj.zero = FALSE, ref.zero = FALSE,
        kint=NULL, ycut=NULL, time = NULL, loglog = FALSE,
        digits=4, name,
        factors=NULL, offset=NULL)
# conf.type:区间类型;置信区间(mean)、预测区间(individual)
Predict()函数的“新数据”使用...表示,直接写出参与预测的变量即可:

predict.2 <- Predict(model.2, wt = seq(1,6,0.5)) 
predict.2

也可以不指定参与预测变量的数值,但需要提前定义变量的范围和等分的个数。指定范围的方法如下:

ddist <- datadist(mtcars)
options(datadist = "ddist")
# 等分数目由np参数确定,默认值为200:

Predict(model.2, wt, np = 10)

03 预测模型置信区间绘图

Predict()函数预测的结果可以直接使用plot()函数和ggplot()函数进行绘制。在加载rms工具包时会自动加载ggplot2工具包。
需要注意的是,rms包中的plot()函数采用的是lattice绘图系统,而非基础绘图系统(graphics),因此不能叠加graphics包中的函数。

plot(predict.2)
ggplot(predict.2)
ggplot(predict.2) +
  geom_point(data = mtcars, aes(wt, mpg))

效果如下:


model.3 <- ols(mpg ~ pol(wt,2), data = mtcars)
predict.3 <- Predict(model.3, wt = seq(1,6,0.5)) 
predict.3
 
ggplot(predict.3) +
  geom_point(data = mtcars, aes(wt, mpg))
ddist <- datadist(mtcars)
options(datadist = "ddist")
model.4 <- ols(mpg ~ wt + drat, data = mtcars)
predict.41 <- Predict(model.4, np = 10)
plot(predict.41)
## 使用单个变量进行预测:
predict.42 <- Predict(model.4, wt = seq(1,6,0.5))
plot(predict.42)
# 未参与预测的变量实际上取的是中位数:
median(mtcars$drat)
## [1] 3.695
# 也可以指定未参与预测变量的取值:

predict.43 <- Predict(model.4, wt = seq(1,6,0.5),
                      drat = 4,
                      np = 10)
plot(predict.43)

rms包中的模型函数基本上是自成体系的,如logistic回归用lrm()函数,以及一系列的样条曲线函数,如rcs()函数(linear tail-restricted cubic spline function)。

ddist <- datadist(mtcars)
options(datadist = "ddist")
model.5 <- ols(mpg ~ rcs(wt) + drat, data = mtcars)
predict.5 <- Predict(model.5, wt)
plot(predict.5)

04 ROC曲线95%置信区间

library(pROC)
data(aSAH)
rocobj <- plot.roc(aSAH$outcome, aSAH$s100b,
                   
                   main="Confidence intervals", percent=TRUE,
                   
                   ci=TRUE, # compute AUC (of AUC by default)
                   
                   print.auc=TRUE) # print the AUC (will contain the CI)
 
ciobj <- ci.se(rocobj, # CI of sensitivity
               
               specificities=seq(0, 100, 5)) # over a select set of specificities
 
plot(ciobj, type="shape", col="#1c61b6AA") # plot as a blue shape
plot(ci(rocobj, of="thresholds", thresholds="best")) # add one threshold
library(pROC)
library(ROCR)
rocobj <- roc(aSAH$outcome, aSAH$s100b,auc = TRUE,
              
              ci=TRUE, # compute AUC (of AUC by default)
              
              print.auc=TRUE) # print the AUC (will contain the CI)

ciobj <- ci.se(rocobj, # CI of sensitivity
               
               specificities=seq(0, 1, 0.01)) # over a select set of specificities

auc<-auc(rocobj)[1]
auc_low<-ci(rocobj,of="auc")[1]
auc_high<-ci(rocobj,of="auc")[3]
auc_full<-paste("AUC:",round(auc,digits = 3),"(",
                round(auc_low,digits = 3),",",round(auc_high,digits = 3),")",sep = "")

data_ci<-ciobj[1:101,1:3]
data_ci<-as.data.frame(data_ci)

library(ggplot2)
ggroc(rocobj,color="red",size=1)+theme_bw()+
  geom_segment(aes(x = 1, y = 0, xend = 0, yend = 1), 
               colour='grey', linetype = 'dotdash') +
  
  geom_ribbon(data = data_ci,aes(x=as.numeric(rownames(data_ci)),ymin=`2.5%`,ymax=`97.5%`), fill = 'lightblue',alpha=0.5)+
  # geom_ribbon(data = data_ci,aes(x=x,ymin=`2.5%`,ymax=`97.5%`), fill = 'lightblue',alpha=0.5)+
  theme(plot.title = element_text(hjust = 0.5), 
        legend.justification=c(1, 0), legend.position=c(.95, .05),
        #legend.title=title, 
        legend.background = element_rect(fill=NULL, size=0.5, 
                                         linetype="solid", colour ="black"))+
  labs(x="Specificity",y="Sensitivity")

参考资料

(1条消息) rms | 如何绘制模型带置信区间的预测曲线_R语言学堂的博客-CSDN博客
https://link.springer.com/content/pdf/bfm%3A978-3-319-19425-7%2F1.pdf
绘制ROC曲线95%置信区间 - 灰信网(软件开发博客聚合) (freesion.com)


关注R小盐,关注科研私家菜(VX_GZH: SciPrivate),有问题请联系R小盐。让我们一起来学习 R语言机器学习与临床预测模型

上一篇下一篇

猜你喜欢

热点阅读