初窥GBDT

2017-04-16 本文已影响0人粒子_6816

1. Boosting

Boosting: Reweighing the sample
for t = 1,...,T:
. construct distribution D_t on {1,...,m}
. find weak hypothesis ("rule of thumb")
![](http://latex.codecogs.com/png.latex?$$h_t:x \to {-1,+1}$$)
with small errorε_t
![](http://latex.codecogs.com/png.latex?$$\varepsilon_t=P_{r D_t}[h(x_i) \ne y_i] = \sum_{h_t(x_i) \ne y_i}{D_t(i)}$$)
.output final hypothesis. H_final
Ada Boost
Ada Boost通过调整样本的权重，来重视误分类样本的影响。在下一个model中使用带权重的样本来训练新的弱分类器。Ada boost使用误差来计算权重的更新比例。
constructing D_t .
![](http://latex.codecogs.com/png.latex?$$D_0(i) = \frac{1}{m}$$)
given D_t and h_t
![](http://latex.codecogs.com/png.latex?$$D_{t+1} = \frac{D_t}{Z_t}*\left{
\begin{aligned}
& \exp^{\alpha t} & \text{ if } y_i=h_t(x_i) \
& \exp^{- \alpha t} & \text{ if } y_i=h_t(x_i)
\end{aligned}
\right.$$)
where Z_t = normalization constant
![](http://latex.codecogs.com/png.latex?$$\alpha_t =\frac{1}{2}\ln(\frac{1-\varepsilon_t}{\varepsilon_t}) > 0$$)
final hypothesis:
![](http://latex.codecogs.com/png.latex?$$H_{final}(x) =sgn ( \sum_t{\alpha_t h_t(x)} ). $$)
Bootstrap也有类似思想，它在每一步迭代时不改变模型本身，也不计算残差，而是从N个instance训练集中按一定概率重新抽取N个instance出来（单个instance可以被重复sample），对着这N个新的instance再训练一轮。由于数据集变了迭代模型训练结果也不一样，而一个instance被前面分错的越厉害，它的概率就被设的越高，这样就能同样达到逐步关注被分错的instance，逐步完善的效果。Adaboost的方法被实践证明是一种很好的防止过拟合的方法，但至于为什么则至今没从理论上被证明。

2. Gradient Boost

Begin
= \arg \ min_\rho \sum_{i=1}^{N} L(y_i, \rho)$$)
For m = 1 to M do:
![](http://latex.codecogs.com/png.latex?$$\tilde{y_i} = - [\frac{\partial L(y_i, F(x_i) }{\partial F(x_i)}]{F(x)} = F{m-1}(x), i = 1,N $$)
![](http://latex.codecogs.com/png.latex?$$\alpha_m = \arg\min_{\alpha, \beta} \sum_{i=1}{N}[\tilde{y_i} - \beta h(x_i; \alpha)]^2 $$)
![](http://latex.codecogs.com/png.latex?$$\rho_m = \arg\min_{\rho} \sum_{i=1}^{N} L({y_i} ; F_{m-1}(x_i) + \rho F(x_i;\alpha_m)) $$)
= F_{m-1}(x) + \rho_m h(x; \alpha_m)$$)
endFor
end

3. GBDT

GBDT(Gradient Boosting Decision Tree) 又叫 MART（Multiple Additive Regression Tree)，是一种迭代的决策树算法，该算法由多棵决策树组成，所有树的结论累加起来做最终答案。它在被提出之初就和SVM一起被认为是泛化能力（generalization)较强的算法。近些年更因为被用于搜索排序的机器学习模型而引起大家关注。
GBDT的核心就在于，每一棵树学的是之前所有树结论和的残差，这个残差就是一个加预测之后能的真实值的累加量。GBDT中所用的树是回归树，而不是分类树。分枝时穷举每一个feature的每个阈值找最好的分割点，衡量最好的标准不再是最大熵，而是最小化均方差--即（每个样本的真实值 - 预测值）^2 的总和 / N，或者说是每个人的预测误差平方和除以N。GBDT的核心在于累加所有树的结果作为最终结果，而分类树的结果显然是没办法累加的。

PS: 熵

转自知乎

参考：

1.http://blog.csdn.net/w28971023/article/details/8240756

初窥GBDT

1. Boosting

2. Gradient Boost

3. GBDT

PS: 熵

参考：

猜你喜欢

热点阅读