R语言代码R data manipulateR for statistics

Restricted cubic spline(RCS):限制性

2021-07-03  本文已影响0人  dasdadf
spline

当我们的自变量和因变量之间不是简单的线性关系,我们还可以通过多项式回归,多元线性回归等方法构造非线性的关系模型,限制性立方样条(RCS)也是一种选择。

什么是Restricted cubic spline

实在是不了解这个东西到底是怎么翻译成的,因为仅仅从他的中文译名来看限制性立方样条,我们可能会有这样的疑惑:这里的每个汉字我都认识,但连在一起到底是个什么玩意儿?

要了解什么是Restricted cubic spline(限制性立方样条),就要先了解什么是spline(样条函数)

样条函数
在计算机科学的计算机辅助设计和计算机图形学中,样条通常是指分段定义的多项式参数曲线。由于样条构造简单,使用方便,拟合准确,并能近似曲线拟合和交互式曲线设计中复杂的形状,样条是这些领域中曲线的常用表示方法。

样条函数里面的关键词分段,然后再加上限制性立方样条中的关键词限制,是我们理解RCS的关键所在。同一般的非线性回归相比,限制性立方样条是分段的,每一区间有其自己的函数,且最两侧的区间都为线性函数。

示例代码

下面是一个使用Cox模型进行限制性立方样条分析的脚本,你可以将里面的模型替换为自己想用的模型~

options(warn = -1)
suppressPackageStartupMessages({
  library(optparse)
  library(stringr)
  library(ggplot2)
  library(rms)
  library(logging)
})
basicConfig()

option_list <- list(
  make_option(c("-t", "--table"),
              type = "character",
              help = "The input table"),
  make_option(c("--time"),
              type = "character",
              help = "The time tag"),
  make_option(c("--dv"),
              type = "character",
              help = "The dependent variable tag name"),
  make_option(c("--iv"),
              type = "character",
              help = "The Independent variable tag name(x)"),
  make_option(c("--covariate"),
              type = "character",
              help = "The covariate variable tag names(comma split)"),
  make_option(
    c("--knots"),
    default = 3,
    type = "integer",
    help = "The number of knots[default= %default]"
  ),
  make_option(
    c("--prefix"),
    default = "res",
    type = "character",
    help = "The out put prefix[default= %default]"
  )
)
opts <- parse_args(
  OptionParser(usage = "%prog [options]",
               option_list = option_list,
               description = "\n使用Cox比例风险回归模型进行限制性立方样条(Restricted cubic spline)分析"),
  positional_arguments = F
)

#### Some Functions ####
RCS_plot <- function(data, x) {
  p <- ggplot() +
    geom_line(
      data = data,
      aes_string(x, "yhat"),
      linetype = "solid",
      size = 1,
      alpha = 0.7,
      color = "#B521F5"
    ) +
    geom_line(
      data = data,
      aes_string(x, "lower"),
      linetype = "dashed",
      size = 1,
      alpha = 0.7,
      color = "#B521F5"
    ) +
    geom_line(
      data = data,
      aes_string(x, "upper"),
      linetype = "dashed",
      size = 1,
      alpha = 0.7,
      color = "#B521F5"
    ) +
    geom_hline(
      yintercept = 1,
      linetype = "solid",
      size = 0.5,
      color = "black"
    ) +
    ylab("HR(95%CI)\n") +
    xlab(str_c("\n", x)) +
    theme_classic() +
    theme(text = element_text(size = 20))
  p
}
########################
loginfo("Read the input table")
data <- read.csv(opts$table, sep = "\t", check.names = F)

# Prepare
loginfo("Prepare the data")
dd <- datadist(data)
options(datadist = 'dd')

# Model
loginfo("Fit the model")
y <- str_c("Surv(", opts$time, ",", opts$dv, ")")
x <- str_c("rcs(", opts$iv, ',', opts$knots, ')')
if (!is.null(opts$covariate)) {
  list_cv <- str_split(opts$covariate, pattern = ',')[[1]]
  cv <- paste(list_cv, sep = ',')
  x <- paste(c(x, cv), sep = ",")
}
form <- as.formula(paste(y, "~", x))
fit <- cph(form, data = data)

# Predict
loginfo("Predict")
HR <- Predict(fit, WBC, fun = exp, ref.zero = TRUE)

# Plot
loginfo("Plot")
p <- RCS_plot(HR, opts$iv)
ggsave(
  str_c(opts$prefix, "pdf", sep = '.'),
  plot = p,
  width = 8,
  height = 7
)

名词解释

hazard ratio

hazard ratio
风险比率,正式的英文名称是Hazard Ratio。风险比率是两个风险率(Hazard Rates)的比值。风险率是单位时间内发生的事件数占被试总体的百分比。

上一篇下一篇

猜你喜欢

热点阅读