Data sciencebioinformatics

机器学习--有监督--GBM(Boosting)

2021-11-13  本文已影响0人  小贝学生信

集成学习(ensemble learning)是采用多个机器学习模型组合进行综合预测,从而提升模型性能的思路,分为bagging与boosting两种。之前学习的随机森林便是bagging的典型代表;而本次学习Gradient boosting machines为代表的boosting则是另一种集成思路。此外,集成学习使用的基学习器模型一般都是决策树(decision tree)。


1、bagging与boosting的区别

(1) bagging

(2) boosting

2、GBM,Gradient boosting machines简单理解

2.1 Gradient descent

The name gradient boosting machine comes from the fact that this procedure can be generalized to loss functions other than SSE.

2.2 GBM的超参数

GBM的超参数主要包含两类,一类是boosting相关的参数;一类是决策树本身的超参数

(1)boosting hyperparameters
(2)tree hyperparameters

超参数调整策略

3、代码实操

示例数据:预测房价
ames <- AmesHousing::make_ames()
dim(ames)
## [1] 2930   81

set.seed(123)
library(rsample)
split <- initial_split(ames, prop = 0.7, 
                       strata = "Sale_Price")
ames_train  <- training(split)
# [1] 2049   81
ames_test   <- testing(split)
# [1] 881  81

library(gbm)
step1 : 先大致探索一下
set.seed(123) # for reproducibility
ames_gbm1 <- gbm(
  formula = Sale_Price ~ .,
  data = ames_train,
  distribution = "gaussian", # SSE loss function
  n.trees = 5000,
  shrinkage = 0.1,
  interaction.depth = 3,
  n.minobsinnode = 10,
  cv.folds = 10)

# find index for number trees with minimum CV error
best <- which.min(ames_gbm1$cv.error)
# [1] 1119

# get MSE and compute RMSE
sqrt(ames_gbm1$cv.error[best])
## [1] 22402.07

# plot error curve
gbm.perf(ames_gbm1, method = "cv")

如下可以看出在1119棵树的时候,交叉验证指标已经达到平台期


step2:学习率指标优化
# create grid search
hyper_grid <- expand.grid(
  learning_rate = c(0.3, 0.1, 0.05, 0.01, 0.005),
  RMSE = NA,
  trees = NA,
  time = NA
)
# execute grid search
for(i in seq_len(nrow(hyper_grid))) {
  # fit gbm
  set.seed(123) # for reproducibility
  train_time <- system.time({
    m <- gbm(
      formula = Sale_Price ~ .,
      data = ames_train,
      distribution = "gaussian",
      n.trees = 5000,
      shrinkage = hyper_grid$learning_rate[i],
      interaction.depth = 3,
      n.minobsinnode = 10,
      cv.folds = 10
    )
  })
  # add SSE, trees, and training time to results
  hyper_grid$RMSE[i] <- sqrt(min(m$cv.error))
  hyper_grid$trees[i] <- which.min(m$cv.error)
  hyper_grid$time[i] <- train_time[["elapsed"]]
}

dplyr::arrange(hyper_grid, RMSE)
#   learning_rate     RMSE trees  time
# 1         0.050 21807.96  1565 66.83
# 2         0.010 22102.34  4986 66.73
# 3         0.100 22402.07  1119 67.84
# 4         0.005 23054.68  4995 66.04
# 5         0.300 24411.95   269 64.84
step3:优化决策树参数
# search grid
hyper_grid <- expand.grid(
  n.trees = 5000,
  shrinkage = 0.05,
  interaction.depth = c(3, 5, 7),
  n.minobsinnode = c(5, 10, 15)
)

# create model fit function
model_fit <- function(n.trees, shrinkage, interaction.depth, n.minobsinnode) {
  set.seed(123)
  m <- gbm(
    formula = Sale_Price ~ .,
    data = ames_train,
    distribution = "gaussian",
    n.trees = n.trees,
    shrinkage = shrinkage,
    interaction.depth = interaction.depth,
    n.minobsinnode = n.minobsinnode,
    cv.folds = 10
  )
  # compute RMSE
  sqrt(min(m$cv.error))
}

# perform search grid with functional programming
hyper_grid$rmse <- purrr::pmap_dbl(
  hyper_grid,
  ~ model_fit(
    n.trees = ..1,
    shrinkage = ..2,
    interaction.depth = ..3,
    n.minobsinnode = ..4
  )
)
# results
dplyr::arrange(hyper_grid, rmse)
#   n.trees shrinkage interaction.depth n.minobsinnode     rmse
# 1    5000      0.05                 5             10 21793.28
# 2    5000      0.05                 3             10 21807.96
# 3    5000      0.05                 5              5 21976.76
# 4    5000      0.05                 3              5 22104.49
# 5    5000      0.05                 5             15 22156.30
# 6    5000      0.05                 3             15 22170.16
# 7    5000      0.05                 7             10 22268.51
# 8    5000      0.05                 7              5 22316.37
# 9    5000      0.05                 7             15 22595.51
step4:确定最佳模型,测试集评价
ame_gbm <- gbm(
  formula = Sale_Price ~ .,
  data = ames_train,
  distribution = "gaussian",
  n.trees = 5000,
  shrinkage = 0.05,
  interaction.depth = 5,
  n.minobsinnode = 10,
  cv.folds = 10)
(best <- which.min(ame_gbm$cv.error))
# [1] 1305
sqrt(ame_gbm$cv.error[best])
# [1] 22475.02

#自动调用最佳数目进行预测
pred = predict(ame_gbm, ames_test)
ModelMetrics::rmse(pred, ames_test$Sale_Price)
# [1] 20010.21

评价特征变量的重要性

vip::vip(ame_gbm) 

由于GBM是基于梯度下降的思路,当遇到非碗形的损失函数曲线时,有可能遇到局部的最低点local minimas,Stochastic gradient descent算法可采用抽样建模方式尽可能找到全局最低点;此外XGBoost可以尽可能避免boosting算法出现过拟合的情况。具体用法就暂不学习了~

上一篇 下一篇

猜你喜欢

热点阅读