网络数据挖掘-L9 SVM 决策树

2018-07-12 本文已影响0人 gb_QA_log

title: 网络数据挖掘-L9 SVM 决策树
date: 2017-07-26 11:48:56
categories: Datamining
mathjax: true
tags: [webDataMining]

L9 Decision Trees & SVM

Decision Trees：ID3算法-CSDN博客

决策树的构造思路

Choose the best attribute(s) to split the remaining instances
and make that attribute a decision node
Repeat this process for recursively for each child
Stop when:
- All the instances have the same target attribute value
- There are no more attributes
- There are no more instances

ID3算法：

每次选择信息熵增益最大的属性来做决策
直到结束状态

所以需要引入两个概念：

信息熵：
对于属性T，当前信息熵：
S为需要决策的属性，如yes no
$Entropy(S)=\sum -P_s * \log_2 P_s$
$=-P_{(s=yes)} * \log_2 P_{(s=yes)}-P_{(s=no)} * \log_2 P_{(s=no)}$
假设按照T属性分类后，这里假设T属性有3个值：
$Entropy(S_{(T=1)})=\sum -P_s * \log_2 P_s$
$Entropy(S_{(T=2)})=\sum -P_s * \log_2 P_s$
$Entropy(S_{(T=3)})=\sum -P_s * \log_2 P_s$
则划分后的信息熵为：
$Entropy(S|T)=P_{S_1}*Entropy(S_1)+P_{S_2}*Entropy(S_2)$
$+P_{S_3}*Entropy(S_3)$
信息熵增益
$IG(T)=Entropy(S)-Entropy(S|T)$

ID3的缺点

Uses expected entropy reduction, not actual reduction
数据需是离散值

决策树的缺点

决策快但是构造时候慢
errors propagating 一个结点时出错下面的判断都是错的

SVM

general idea

Paste_Image.png
在样本空间里，

w^Tx+b=0

划分平面，且要使得

\frac{2}{||w||}

最大,等价于

||w||^2=w^Tw

最小
对于测试数据，
\begin{cases}
w^Tx_i+b \geq 1, &\quad y_i = +1 \
w^Tx_i+b \leq -1,&\quad y_i = -1
\end{cases}

Paste_Image.png

于 $||w||^2=w^Tw$ 最小，是一个凸二次规划(convex quadratic programming)问题，用Lagrange multiplier $\alpha _i \geq 0$ 得到其对偶问题(dual problem)，可以高效求解。

$Q(\alpha)=\sum \alpha_j - \frac{1}{2}\sum \sum \alpha_i \alpha_j y_i y_j x_i^Tx_j$
(1) $\sum \alpha_iy_i=0$
(2) $\alpha_i \geq 0$ ,for all $\alpha_i$
$w=\sum \alpha_iy_ix_i$ , $b=y_k-\sum \alpha_i y_i y_j x_i^Tx_k$ ,for any $a_k >0$
$f(x)=w^Tx+b$
详见西瓜书

kernel trick 核函数

详见西瓜书

soft margin & solving