
9 GBDT调参方法详解

2018-06-25

原文地址:Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python by Aarshay Jain



不像bagging算法只能改善模型高方差(high variance)情况Boosting算法对同时控制偏差(bias)和方差(variance)都有非常好的效果,而且更加高效




  1. Boosing是怎么工作的?
  2. 理解GBM模型中的参数
  3. 学会调参(附详例)


Boosting可以将一系列弱学习因子(weak learners)相结合来提升总体模型的预测准确度在任意时间t,根据t-1时刻得到的结果我们给当前结果赋予一个权重之前正确预测的结果获得较小权重,错误分类的结果得到较大权重。回归问题的处理方法也是相似的。


  1. 图一: 第一个弱学习因子的预测结果(从左至右)

    • 一开始所有的点具有相同的权重(以点的尺寸表示)。
    • 分类线正确地分类了两个正极和五个负极的点。
  2. 图二: 第二个弱学习因子的预测结果

    • 在图一中被正确预测的点有较小的权重(尺寸较小),而被预测错误的点则有较大的权重
    • 这时候模型就会更加注重具有大权重的点的预测结果,即上一轮分类错误的点,现在这些点被正确归类了,但其他点中的一些点却归类错误





  1. 树参数调节模型中每个决定树的性质
  2. Boosting参数调节模型中boosting的操作
  3. 其他模型参数调节模型总体的各项运作



  1. min_ samples_split

    • 定义了树中一个中间节点所需要用来分裂的最少样本数
    • 可以避免过拟合(over-fitting)。如果用于分类的样本数太小,模型可能只适用于用来训练的样本的分类,而用较多的样本数则可以避免这个问题。
    • 但是如果设定的值过大,就可能出现欠拟合现象(under-fitting)。因此我们可以用CV值(离散系数)考量调节效果。
  2. min_ samples_leaf

    • 定义了树中叶子节点所需要的最少的样本数
    • 同样,它也可以用来防止过度拟合
    • 不均等分类问题中(imbalanced class problems),一般这个参数需要被设定为较小的值,因为大部分少数类别(minority class)含有的样本都比较小
  3. min_ weight_ fraction_leaf

    • 和上面min_ samples_ leaf很像,不同的是这里需要的是一个比例而不是绝对数值叶子节点所需的样本数占总样本数的比值
    • 2和3只需要定义一个就行了
  4. max_ depth

    • 定义了树的最大深度
    • 它也可以控制过度拟合,因为分类树越深就越可能过度拟合
    • 当然也应该用CV值检验
  5. max_ leaf_ nodes

    • 定义了决策树里最多能有多少个叶子节点
    • 这个属性有可能在上面max_ depth里就被定义了。比如深度为n的二叉树就有最多2^n个终点节点。
    • 如果我们定义了max_ leaf_ nodesGBM就会忽略前面的max_depth
  6. max_ features

    • 决定了用于分类的特征数,是人为随机定义的
    • 根据经验一般选择总特征数的平方根就可以工作得很好了,但还是应该用不同的值尝试,最多可以尝试总特征数的30%-40%.
    • 过多的分类特征可能也会导致过度拟合


1. 初始分类目标的参数值
2. 对所有的分类树进行迭代:
    2.1 根据前一轮分类树的结果更新分类目标的权重值(被错误分类的有更高的权重)
    2.2 用训练的子样本建模
    2.3 用所得模型对所有的样本进行预测
    2.4 再次根据分类结果更新权重值
3. 返回最终结果


以上步骤是一个极度简化的GBM模型,而目前我们所提到的参数会影响 2.2 用训练的子样本建模这一步,即建模的过程。现在我们来看看影响boosting过程的参数:

  1. learning_ rate(学习率)

    • 这个参数决定着每一个决策树对于最终结果(步骤2.4 更新权重值)的影响GBM设定了初始的权重值之后,每一次树分类都会更新这个值,而learning_ rate控制着每次更新的幅度(即8 提升GBDTShrinkage因子
    • 一般来说这个值不应该设的比较大,因为较小的learning rate使得模型对不同的树更加稳健,就能更好地综合它们的结果。
  2. n_ estimators

    • 定义了需要使用到的决策树的数量(步骤2)
    • 虽然GBM即使在有较多决策树时仍然能保持稳健,但还是可能发生过度拟合。所以也需要针对learning rateCV值检验。
  3. subsample

    • 训练每个决策树所用到的子样本占总样本的比例,而对于子样本的选择是随机的
    • 稍小于1的值能够使模型更稳健,因为这样减少了方差
    • 一把来说用~0.8就行了,更好的结果可以用调参获得。



  1. loss

    • 指的是每一次节点分裂所要最小化的损失函数(loss function)
    • 对于分类和回归模型可以有不同的值。一般来说不用更改,用默认值就可以了,除非你对它及它对模型的影响很清楚
  2. init

    • 它影响了输出参数的起始化过程
    • 如果我们有一个模型,它的输出结果会用来作为GBM模型的起始估计,这个时候就可以用init
  3. random_ state

    • 作为每次产生随机数的随机种子
    • 使用随机种子对于调参过程是很重要的,因为如果我们每次都用不同的随机种子,即使参数值没变每次出来的结果也会不同,这样不利于比较不同模型的结果
    • 任一个随机样本都有可能导致过度拟合,可以用不同的随机样本建模来减少过度拟合的可能,但这样计算上也会昂贵很多,因而我们很少这样用
  4. verbose

    • 决定建模完成后对输出的打印方式:

      • 0:不输出任何结果(默认)
      • 1:打印特定区域的树的输出结果
      • >1:打印所有结果
  5. warm_ start

    • 这个参数的效果很有趣,有效地使用它可以省很多事
    • 使用它我们就可以用一个建好的模型来训练额外的决策树,能节省大量的时间,对于高阶应用我们应该多多探索这个选项。
  6. presort

    • 决定是否对数据进行预排序,可以使得树分裂地更快
    • 默认情况下是自动选择的,当然你可以对其更改


接下来要用的数据集来自Data Hackathon 3.x AV hackathon。比赛的细节可以在比赛网站上找到(http://datahack.analyticsvidhya.com/contest/data-hackathon-3x),数据可以从这里下载:http://www.analyticsvidhya.com/wp-content/uploads/2016/02/Dataset.rar。我对数据做了一些清洗:

你们可以从GitHubdata_preparation iPython notebook中看到这些改变。


import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import cross_validation, metrics
from sklearn.model_selection import GridSearchCV

import matplotlib.pylab as plt

%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 16, 9

train = pd.read_csv('train_modified.csv')
target = 'Disbursed'
IDcol = 'ID'
def modelfit(alg, dtrain, predictors, performCV=True, printFeatureImportance=True, cv_folds=5):
    # 训练模型
    alg.fit(dtrain[predictors], dtrain['Disbursed'])

    # 预测训练集
    dtrain_predictions = alg.predict(dtrain[predictors])
    dtrain_predprob = alg.predict_proba(dtrain[predictors])[:,1]

    # cross-validation
    if performCV:
        cv_score = cross_validation.cross_val_score(alg, dtrain[predictors], \
                                                    dtrain['Disbursed'], \
                                                    cv=cv_folds, \

    # 打印模型报告
    print("Model Report")
    print("Accuracy : {0:.4}".format(metrics.accuracy_score(dtrain['Disbursed'].values, dtrain_predictions)))
    print("AUC Score (Train): {0:.4}".format(metrics.roc_auc_score(dtrain['Disbursed'], dtrain_predprob)))

    if performCV:
        print("CV Score : Mean - {0:.7} | Std - {1:.7} | Min - {2:.7} | Max - {3:.7}".\

    # 打印重要特征值
    if printFeatureImportance:
        feat_imp = pd.Series(alg.feature_importances_, predictors).sort_values(ascending=False)
        feat_imp.plot(kind='bar', title='Feature Importances')
        plt.ylabel('Feature Importance Score')
接着就要创建一个基线模型(baseline model)。
#Choose all predictors except target & IDcols
predictors = [x for x in train.columns if x not in [target, IDcol]]
gbm0 = GradientBoostingClassifier(random_state=10) # 建模
modelfit(gbm0, train, predictors)     # alg==gbm0 dtrain==train
Model Report
Accuracy : 0.9856
AUC Score (Train): 0.8623
CV Score : Mean - 0.8318589 | Std - 0.008756969 | Min - 0.820805 | Max - 0.8438558


5.1 参数调节的一般方法

之前说过,我们要调节的参数有两种:树参数boosting参数learning rate没有什么特别的调节方法,因为只要我们训练的树足够多learning rate总是小值来得好。

虽然随着决策树的增多GBM并不会明显得过度拟合,高learing rate还是会导致这个问题,但如果我们一味地减小learning rate、增多树,计算就会非常昂贵而且需要运行很长时间。了解了这些问题,我们决定采取以下方法调参策略:

  1. 选择一个相对来说稍微高一点的learning rate一般默认的值是0.1,不过针对不同的问题,0.050.2之间都可以
  1. 决定当前learning rate下最优的决策树数量。它的值应该在40-70之间。记得选择一个你的电脑还能快速运行的值,因为之后这些树会用来做很多测试和调参。
  1. 接着调节树参数来调整learning rate和树的数量。我们可以选择不同的参数来定义一个决策树,后面会有这方面的例子
  1. 降低learning rate,同时会增加相应的决策树数量使得模型更加稳健

5.2 固定 learning rate和需要估测的决策树数量


  1. min_ samples_ split=500: 这个值应该在总样本数的0.5-1%之间,由于我们研究的是不均等分类问题,我们可以取这个区间里一个比较小的数500
  2. min_ samples_ leaf=50: 可以凭感觉选一个合适的数,只要不会造成过度拟合。同样因为不均等分类的原因,这里我们选择一个比较小的值
  3. max_ depth=8: 根据观察数和自变量数,这个值应该在5-8之间。这里我们的数据有87000行,49列,所以我们先选深度为8
  4. max_ features=’sqrt’: 经验上一般都选择平方根
  5. subsample=0.8: 开始的时候一般就用0.8


现在我们可以根据learning rate的默认值0.1来找到所需要的最佳的决策树数量,可以利用网格搜索(grid search)实现,以10个数递增,从20测到80(先找到决策树数量)

#利用网格搜索(grid search)实现,以10个数递增,从20测到80
#Choose all predictors except target & IDcols

predictors = [x for x in train.columns if x not in [target, IDcol]]
param_test1 = {'n_estimators':range(20,81,10)}
gsearch1 = GridSearchCV(estimator = GradientBoostingClassifier(learning_rate=0.1, \
                        param_grid = param_test1, \
                        iid=False, \

GridSearchCV(cv=5, error_score='raise',
       estimator=GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=8,
              max_features='sqrt', max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=50, min_samples_split=500,
              min_weight_fraction_leaf=0.0, n_estimators=100,
              presort='auto', random_state=10, subsample=0.8, verbose=0,
       fit_params=None, iid=False, n_jobs=4,
       param_grid={'n_estimators': range(20, 81, 10)},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='roc_auc', verbose=0)


[mean: 0.83337, std: 0.00991, params: {'n_estimators': 20},
 mean: 0.83697, std: 0.00994, params: {'n_estimators': 30},
 mean: 0.83832, std: 0.01050, params: {'n_estimators': 40},
 mean: 0.83867, std: 0.01081, params: {'n_estimators': 50},
 mean: 0.83939, std: 0.01077, params: {'n_estimators': 60},
 mean: 0.83891, std: 0.01044, params: {'n_estimators': 70},
 mean: 0.83807, std: 0.01093, params: {'n_estimators': 80}]
{'n_estimators': 60}

可以看出对于0.1learning rate, 60个树是最佳的,而且60也是一个合理的决策树数量,所以我们就直接用60。但在一些情况下上面这段代码给出的结果可能不是我们想要的,比如:

  1. 如果给出的输出是20,可能就要降低我们的learning rate0.05,然后再搜索一遍。
  2. 如果输出值太高,比如100,因为调节其他参数需要很长时间,这时候可以把learniing rate稍微调高一点

5.3 调节树参数


  1. 调节max_depthmin_samples_split
  2. 调节min_samples_leaf
  3. 调节max_features


重要提示:接着我会做比较久的网格搜索(grid search),可能会花上15-30分钟。你在自己尝试的时候应该根据电脑情况适当调整需要测试的值。


# 调节树参数
param_test2 = {'max_depth':range(5,16,2), 'min_samples_split':range(200,1001,200)}
gsearch2 = GridSearchCV(estimator = GradientBoostingClassifier(learning_rate=0.1, \
                                                               n_estimators=60, \
                                                               max_features='sqrt', \
                                                               subsample=0.8, \
                                                               random_state=10), \
                        param_grid = param_test2, \
                        iid=False, \


gsearch2.grid_scores_, gsearch2.best_params_, gsearch2.best_score_
([mean: 0.83297, std: 0.01226, params: {'max_depth': 5, 'min_samples_split': 200},
  mean: 0.83251, std: 0.01054, params: {'max_depth': 5, 'min_samples_split': 400},
  mean: 0.83386, std: 0.01415, params: {'max_depth': 5, 'min_samples_split': 600},
  mean: 0.83379, std: 0.01169, params: {'max_depth': 5, 'min_samples_split': 800},
  mean: 0.83339, std: 0.01266, params: {'max_depth': 5, 'min_samples_split': 1000},
  mean: 0.83392, std: 0.00758, params: {'max_depth': 7, 'min_samples_split': 200},
  mean: 0.83663, std: 0.00991, params: {'max_depth': 7, 'min_samples_split': 400},
  mean: 0.83481, std: 0.00826, params: {'max_depth': 7, 'min_samples_split': 600},
  mean: 0.83786, std: 0.01067, params: {'max_depth': 7, 'min_samples_split': 800},
  mean: 0.83769, std: 0.01060, params: {'max_depth': 7, 'min_samples_split': 1000},
  mean: 0.83581, std: 0.01003, params: {'max_depth': 9, 'min_samples_split': 200},
  mean: 0.83729, std: 0.00959, params: {'max_depth': 9, 'min_samples_split': 400},
  mean: 0.83317, std: 0.00881, params: {'max_depth': 9, 'min_samples_split': 600},
  mean: 0.83831, std: 0.00953, params: {'max_depth': 9, 'min_samples_split': 800},
  mean: 0.83753, std: 0.01012, params: {'max_depth': 9, 'min_samples_split': 1000},
  mean: 0.82978, std: 0.00888, params: {'max_depth': 11, 'min_samples_split': 200},
  mean: 0.82951, std: 0.00621, params: {'max_depth': 11, 'min_samples_split': 400},
  mean: 0.83305, std: 0.01017, params: {'max_depth': 11, 'min_samples_split': 600},
  mean: 0.83192, std: 0.00844, params: {'max_depth': 11, 'min_samples_split': 800},
  mean: 0.83566, std: 0.01018, params: {'max_depth': 11, 'min_samples_split': 1000},
  mean: 0.82438, std: 0.01078, params: {'max_depth': 13, 'min_samples_split': 200},
  mean: 0.83010, std: 0.00862, params: {'max_depth': 13, 'min_samples_split': 400},
  mean: 0.83228, std: 0.01020, params: {'max_depth': 13, 'min_samples_split': 600},
  mean: 0.83480, std: 0.01193, params: {'max_depth': 13, 'min_samples_split': 800},
  mean: 0.83372, std: 0.00844, params: {'max_depth': 13, 'min_samples_split': 1000},
  mean: 0.82056, std: 0.00913, params: {'max_depth': 15, 'min_samples_split': 200},
  mean: 0.82217, std: 0.00961, params: {'max_depth': 15, 'min_samples_split': 400},
  mean: 0.82916, std: 0.00927, params: {'max_depth': 15, 'min_samples_split': 600},
  mean: 0.82900, std: 0.01046, params: {'max_depth': 15, 'min_samples_split': 800},
  mean: 0.83320, std: 0.01389, params: {'max_depth': 15, 'min_samples_split': 1000}],
 {'max_depth': 9, 'min_samples_split': 800},


param_test3 = {'min_samples_split':range(1000,2100,200), 'min_samples_leaf':range(30,71,10)}
gsearch3 = GridSearchCV(estimator = GradientBoostingClassifier(learning_rate=0.1, \
                                                               max_features='sqrt', \
                                                               subsample=0.8, \
                                                               random_state=10), \
                        param_grid = param_test3, \
                        iid=False, \


gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_
([mean: 0.83821, std: 0.01092, params: {'min_samples_split': 1000, 'min_samples_leaf': 30},
  mean: 0.83889, std: 0.01271, params: {'min_samples_split': 1200, 'min_samples_leaf': 30},
  mean: 0.83552, std: 0.01024, params: {'min_samples_split': 1400, 'min_samples_leaf': 30},
  mean: 0.83683, std: 0.01429, params: {'min_samples_split': 1600, 'min_samples_leaf': 30},
  mean: 0.83958, std: 0.01233, params: {'min_samples_split': 1800, 'min_samples_leaf': 30},
  mean: 0.83852, std: 0.01097, params: {'min_samples_split': 2000, 'min_samples_leaf': 30},
  mean: 0.83851, std: 0.00908, params: {'min_samples_split': 1000, 'min_samples_leaf': 40},
  mean: 0.83757, std: 0.01274, params: {'min_samples_split': 1200, 'min_samples_leaf': 40},
  mean: 0.83757, std: 0.01074, params: {'min_samples_split': 1400, 'min_samples_leaf': 40},
  mean: 0.83779, std: 0.01199, params: {'min_samples_split': 1600, 'min_samples_leaf': 40},
  mean: 0.83764, std: 0.01366, params: {'min_samples_split': 1800, 'min_samples_leaf': 40},
  mean: 0.83759, std: 0.01222, params: {'min_samples_split': 2000, 'min_samples_leaf': 40},
  mean: 0.83650, std: 0.00983, params: {'min_samples_split': 1000, 'min_samples_leaf': 50},
  mean: 0.83784, std: 0.01169, params: {'min_samples_split': 1200, 'min_samples_leaf': 50},
  mean: 0.83892, std: 0.01234, params: {'min_samples_split': 1400, 'min_samples_leaf': 50},
  mean: 0.83825, std: 0.01371, params: {'min_samples_split': 1600, 'min_samples_leaf': 50},
  mean: 0.83806, std: 0.01099, params: {'min_samples_split': 1800, 'min_samples_leaf': 50},
  mean: 0.83821, std: 0.01014, params: {'min_samples_split': 2000, 'min_samples_leaf': 50},
  mean: 0.83636, std: 0.01118, params: {'min_samples_split': 1000, 'min_samples_leaf': 60},
  mean: 0.83976, std: 0.00994, params: {'min_samples_split': 1200, 'min_samples_leaf': 60},
  mean: 0.83735, std: 0.01217, params: {'min_samples_split': 1400, 'min_samples_leaf': 60},
  mean: 0.83685, std: 0.01325, params: {'min_samples_split': 1600, 'min_samples_leaf': 60},
  mean: 0.83626, std: 0.01153, params: {'min_samples_split': 1800, 'min_samples_leaf': 60},
  mean: 0.83788, std: 0.01147, params: {'min_samples_split': 2000, 'min_samples_leaf': 60},
  mean: 0.83751, std: 0.01027, params: {'min_samples_split': 1000, 'min_samples_leaf': 70},
  mean: 0.83854, std: 0.01111, params: {'min_samples_split': 1200, 'min_samples_leaf': 70},
  mean: 0.83777, std: 0.01186, params: {'min_samples_split': 1400, 'min_samples_leaf': 70},
  mean: 0.83796, std: 0.01093, params: {'min_samples_split': 1600, 'min_samples_leaf': 70},
  mean: 0.83816, std: 0.01052, params: {'min_samples_split': 1800, 'min_samples_leaf': 70},
  mean: 0.83677, std: 0.01164, params: {'min_samples_split': 2000, 'min_samples_leaf': 70}],
 {'min_samples_leaf': 60, 'min_samples_split': 1200},


modelfit(gsearch3.best_estimator_, train, predictors)
Model Report
Accuracy : 0.9854
AUC Score (Train): 0.8965
CV Score : Mean - 0.8397598 | Std - 0.009936017 | Min - 0.8255474 | Max - 0.8527672



param_test4 = {'max_features':range(7,20,2)}
gsearch4 = GridSearchCV(estimator = GradientBoostingClassifier(learning_rate=0.1, \
                                                               max_depth=9, \
                                                               min_samples_split=1200, \
                                                               min_samples_leaf=60, \
                                                               subsample=0.8, \
                        param_grid = param_test4, \
                        iid=False, \


gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_
([mean: 0.83976, std: 0.00994, params: {'max_features': 7},
  mean: 0.83648, std: 0.00988, params: {'max_features': 9},
  mean: 0.83919, std: 0.01042, params: {'max_features': 11},
  mean: 0.83738, std: 0.01017, params: {'max_features': 13},
  mean: 0.83898, std: 0.01101, params: {'max_features': 15},
  mean: 0.83495, std: 0.00931, params: {'max_features': 17},
  mean: 0.83524, std: 0.01018, params: {'max_features': 19}],
 {'max_features': 7},


5.4 调节子样本比例来降低learning rate


# 调节子样本比例来降低learning rate
param_test5 = {'subsample':[0.6,0.7,0.75,0.8,0.85,0.9]}
gsearch5 = GridSearchCV(estimator = GradientBoostingClassifier(learning_rate=0.1, \
                                                               min_samples_split=1200, \
                                                               min_samples_leaf=60, \
                                                               subsample=0.8, \
                        param_grid = param_test5, \
                        iid=False, \


gsearch5.grid_scores_, gsearch5.best_params_, gsearch5.best_score_
([mean: 0.83645, std: 0.00942, params: {'subsample': 0.6},
  mean: 0.83629, std: 0.01185, params: {'subsample': 0.7},
  mean: 0.83601, std: 0.01074, params: {'subsample': 0.75},
  mean: 0.83976, std: 0.00994, params: {'subsample': 0.8},
  mean: 0.84086, std: 0.00997, params: {'subsample': 0.85},
  mean: 0.83828, std: 0.00984, params: {'subsample': 0.9}],
 {'subsample': 0.85},

给出的结果是0.85。这样所有的参数都设定好了,现在我们要做的就是进一步减少learning rate,就相应地增加了树的数量。需要注意的是树的个数是被动改变的,可能不是最佳的,但也很合适。随着树个数的增加,找到最佳值和CV的计算量也会加大,为了看出模型执行效率,我还提供了我每个模型在比赛的排行分数(leaderboard score),怎么得到这个数据不是公开的,你很难重现这个数字,它只是为了更好地帮助我们理解模型表现。

现在我们先把learning rate降一半,至0.05,这样树的个数就相应地加倍到120

predictors = [x for x in train.columns if x not in [target, IDcol]]
gbm_tuned_1 = GradientBoostingClassifier(learning_rate=0.05, \
                                         max_depth=9, \
                                         min_samples_leaf=60, \
                                         subsample=0.85, \
                                         random_state=10, \

modelfit(gbm_tuned_1, train, predictors)
Model Report
Accuracy : 0.9854
AUC Score (Train): 0.8976
CV Score : Mean - 0.8391332 | Std - 0.009437997 | Min - 0.8271238 | Max - 0.8511221


接下来我们把learning rate进一步减小到原值的十分之一,即0.01,相应地,树的个数变为600

predictors = [x for x in train.columns if x not in [target, IDcol]]
gbm_tuned_2 = GradientBoostingClassifier(learning_rate=0.01, \
                                         max_depth=9, \
                                         min_samples_leaf=60, \
                                         subsample=0.85, \
                                         random_state=10, \

modelfit(gbm_tuned_2, train, predictors)
Model Report
Accuracy : 0.9854
AUC Score (Train): 0.9
CV Score : Mean - 0.8407913 | Std - 0.01011421 | Min - 0.8255379 | Max - 0.8522251


继续把learning rate缩小至二十分之一,即0.005,这时候我们有1200个树。

predictors = [x for x in train.columns if x not in [target, IDcol]]
gbm_tuned_3 = GradientBoostingClassifier(learning_rate=0.005, \
                                         max_depth=9, \
                                         min_samples_split=1200, \
                                         min_samples_leaf=60, \
                                         subsample=0.85, \
                                         random_state=10, \

modelfit(gbm_tuned_3, train, predictors, performCV=False)
Model Report
Accuracy : 0.9854
AUC Score (Train): 0.9007


排行得分稍微降低了,我们停止减少learning rate,只单方面增加树的个数,试试1500个树。

predictors = [x for x in train.columns if x not in [target, IDcol]]
gbm_tuned_4 = GradientBoostingClassifier(learning_rate=0.005, \
                                         max_depth=9, \
                                         min_samples_split=1200, \
                                         min_samples_leaf=60, \
                                         subsample=0.85, \
                                         random_state=10, \

modelfit(gbm_tuned_4, train, predictors, performCV=False)
Model Report
Accuracy : 0.9854
AUC Score (Train): 0.9063





这篇文章详细地介绍了GBM模型。我们首先了解了何为boosting,然后详细介绍了各种参数。 这些参数可以被分为3类:树参数,boosting参数,和其他影响模型的参数。最后我们提到了用GBM解决问题的 一般方法,并且用AV Data Hackathon 3.x problem数据运用了这些方法。最后,希望这篇文章确实帮助你更好地理解了GBM,在下次运用GBM解决问题的时候也更有信心。


In [1]: from sklearn.ensemble import GradientBoostingClassifier

In [2]: GradientBoostingClassifier?
Init signature: GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=1e-07, init=None, random_state=None, max_features=None, verbose=0, max_leaf_
nodes=None, warm_start=False, presort='auto')
Gradient Boosting for classification.

GB builds an additive model in a
forward stage-wise fashion; it allows for the optimization of
arbitrary differentiable loss functions. In each stage ``n_classes_``
regression trees are fit on the negative gradient of the
binomial or multinomial deviance loss function. Binary classification
is a special case where only a single regression tree is induced.

Read more in the :ref:`User Guide <gradient_boosting>`.

loss : {'deviance', 'exponential'}, optional (default='deviance')
    loss function to be optimized. 'deviance' refers to
    deviance (= logistic regression) for classification
    with probabilistic outputs. For loss 'exponential' gradient
    boosting recovers the AdaBoost algorithm.

learning_rate : float, optional (default=0.1)
    learning rate shrinks the contribution of each tree by `learning_rate`.
    There is a trade-off between learning_rate and n_estimators.

n_estimators : int (default=100)
    The number of boosting stages to perform. Gradient boosting
    is fairly robust to over-fitting so a large number usually
    results in better performance.

max_depth : integer, optional (default=3)
    maximum depth of the individual regression estimators. The maximum
    depth limits the number of nodes in the tree. Tune this parameter
    for best performance; the best value depends on the interaction
    of the input variables.

criterion : string, optional (default="friedman_mse")
    The function to measure the quality of a split. Supported criteria
    are "friedman_mse" for the mean squared error with improvement
    score by Friedman, "mse" for mean squared error, and "mae" for
    the mean absolute error. The default value of "friedman_mse" is
    generally the best as it can provide a better approximation in
    some cases.

    .. versionadded:: 0.18

min_samples_split : int, float, optional (default=2)
    The minimum number of samples required to split an internal node:

    - If int, then consider `min_samples_split` as the minimum number.
    - If float, then `min_samples_split` is a percentage and
      `ceil(min_samples_split * n_samples)` are the minimum
      number of samples for each split.

    .. versionchanged:: 0.18
       Added float values for percentages.

min_samples_leaf : int, float, optional (default=1)
    The minimum number of samples required to be at a leaf node:

    - If int, then consider `min_samples_leaf` as the minimum number.
    - If float, then `min_samples_leaf` is a percentage and
      `ceil(min_samples_leaf * n_samples)` are the minimum
      number of samples for each node.

    .. versionchanged:: 0.18
       Added float values for percentages.

min_weight_fraction_leaf : float, optional (default=0.)
    The minimum weighted fraction of the sum total of weights (of all
    the input samples) required to be at a leaf node. Samples have
    equal weight when sample_weight is not provided.

subsample : float, optional (default=1.0)
    The fraction of samples to be used for fitting the individual base
    learners. If smaller than 1.0 this results in Stochastic Gradient
    Boosting. `subsample` interacts with the parameter `n_estimators`.
    Choosing `subsample < 1.0` leads to a reduction of variance
    and an increase in bias.

max_features : int, float, string or None, optional (default=None)
    The number of features to consider when looking for the best split:

    - If int, then consider `max_features` features at each split.
    - If float, then `max_features` is a percentage and
      `int(max_features * n_features)` features are considered at each
    - If "auto", then `max_features=sqrt(n_features)`.
    - If "sqrt", then `max_features=sqrt(n_features)`.
    - If "log2", then `max_features=log2(n_features)`.
    - If None, then `max_features=n_features`.

    Choosing `max_features < n_features` leads to a reduction of variance
    and an increase in bias.

    Note: the search for a split does not stop until at least one
    valid partition of the node samples is found, even if it requires to
    effectively inspect more than ``max_features`` features.

max_leaf_nodes : int or None, optional (default=None)
    Grow trees with ``max_leaf_nodes`` in best-first fashion.
    Best nodes are defined as relative reduction in impurity.
    If None then unlimited number of leaf nodes.

min_impurity_split : float, optional (default=1e-7)
    Threshold for early stopping in tree growth. A node will split
    if its impurity is above the threshold, otherwise it is a leaf.

    .. versionadded:: 0.18

init : BaseEstimator, None, optional (default=None)
    An estimator object that is used to compute the initial
    predictions. ``init`` has to provide ``fit`` and ``predict``.
    If None it uses ``loss.init_estimator``.

verbose : int, default: 0
    Enable verbose output. If 1 then it prints progress and performance
    once in a while (the more trees the lower the frequency). If greater
    than 1 then it prints progress and performance for every tree.

warm_start : bool, default: False
    When set to ``True``, reuse the solution of the previous call to fit
    and add more estimators to the ensemble, otherwise, just erase the
    previous solution.

random_state : int, RandomState instance or None, optional (default=None)
    If int, random_state is the seed used by the random number generator;
    If RandomState instance, random_state is the random number generator;
    If None, the random number generator is the RandomState instance used
    by `np.random`.

presort : bool or 'auto', optional (default='auto')
    Whether to presort the data to speed up the finding of best splits in
    fitting. Auto mode by default will use presorting on dense data and
    default to normal sorting on sparse data. Setting presort to true on
    sparse data will raise an error.

    .. versionadded:: 0.17
       *presort* parameter.

feature_importances_ : array, shape = [n_features]
    The feature importances (the higher, the more important the feature).

oob_improvement_ : array, shape = [n_estimators]
    The improvement in loss (= deviance) on the out-of-bag samples
    relative to the previous iteration.
    ``oob_improvement_[0]`` is the improvement in
    loss of the first stage over the ``init`` estimator.

train_score_ : array, shape = [n_estimators]
    The i-th score ``train_score_[i]`` is the deviance (= loss) of the
    model at iteration ``i`` on the in-bag sample.
    If ``subsample == 1`` this is the deviance on the training data.

loss_ : LossFunction
    The concrete ``LossFunction`` object.

init : BaseEstimator
    The estimator that provides the initial predictions.
    Set via the ``init`` argument or ``loss.init_estimator``.

estimators_ : ndarray of DecisionTreeRegressor, shape = [n_estimators, ``loss_.K``]
    The collection of fitted sub-estimators. ``loss_.K`` is 1 for binary
    classification, otherwise n_classes.

See also
sklearn.tree.DecisionTreeClassifier, RandomForestClassifier

J. Friedman, Greedy Function Approximation: A Gradient Boosting
Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.

J. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman.
Elements of Statistical Learning Ed. 2, Springer, 2009.
