机器学习支持向量机

支持向量机终章

2019-03-25  本文已影响9人  Vophan

占据统计学习方法大半江山的SVM,终于在昨天以我的代码写完,画上了一个句号,不得不说,支持向量机不是一个简单的东西,尽管花了很大的力气去做,在最后,也没有将所有的细节都理解了,在这,我准备总结一些支持向量机,也是对前面几篇理解不正确的地方的一个修正。

0X01:算法

从大体上看,给我们一组数据,我们怎么才能通过支持向量机训练模型来拟合这些数据呢?

  1. 确定使用什么核技巧,是线性核,还是多项式核,这个是取决于你数据是不是非线性可分的。

  2. 确定最大迭代次数,确定惩罚参数C的大小,与软间隔硬间隔宽窄有关。

  3. 使用SMO算法确定要优化的两个拉格朗日乘子,通过检验样本点是否满足KKT条件确定第一个乘子,通过等式约束条件确定第二个乘子。

  4. 根据选定的两个乘子对应的label是否相同,求出拉格朗日乘子的上下界。

  5. 通过预测函数f(x)求出误差值E:
    E_i = f(x) - y_i
    再通过data,计算出\eta,也就是说:
    \eta = K_{1,1}+K_{2,2}-2K_{1,2}
    最终得到\alpha_2的未修正值:
    \alpha_{2,new} = \alpha_{2,old}+\frac{y_j(E_i-E_j)}{\eta}

  6. 通过上面得到的上下界,修剪\alpha_{2,new}:

    def compare(self, alpha_not_clip, L, H):
    
        if alpha_not_clip < L:
            return L
        elif alpha_not_clip > H:
            return H
        else:
            return alpha_not_clip
    
  7. 通过\alpha_{2,new}算出\alpha_{1,new}:
    \alpha_{1,new} = \alpha_{1,old} + y_1y_2(\alpha_{2,old} - \alpha_{2,new})
    这个公式是这么来的:
    \alpha_{1,new} + \alpha_{2,new} = \alpha_{1,old}+\alpha_{2,old} = \zeta

  8. 根据\alpha_{1,new},\alpha_{2,new}得到b_{1,new},b_{2,new}:
    b_{1,new} = -E_1-y_1K_{11}(\alpha_{1,new} - \alpha_{1,old}) - y_2K_{21}(\alpha_{2,new} - \alpha_{2,old}) + b_{old}
    b_{2,new}同理

    这个公式是从这里来的:
    \sum_{i=1}^N\alpha_iy_iK_{i1}+b=y_1
    再用E的定义式替换,就得到了最终的b

  9. 更新E的值:
    E_{i}^{new} = \sum_{S}y_j\alpha_jK(x_i,x_j)+b_{new}-y_i

  10. 直到迭代结束

0x02 不能理解的地方

尽管看了很多内容,仍然对一些地方理解不了:

  1. 判断样本点是否满足KKT条件的方式

0x03 补充

见好多人都提到了坐标上升法,我也在这里说一说:

坐标上升法确实与SMO确实很相似,但是有一些很细微,也很重要的区别:


还有KKT条件的一些理解:

KKT条件其实是在有约束的优化问题上(不等式约束上)推到出来的,最优解应该满足的条件。

具体参考这篇blog:浅谈最优化问题的KKT条件

0x04 代码

代码参考了小伙伴Tenshine的代码,但也对他的代码做了一些改进,将他代码中的循环全部以矩阵运算的方式展开了,加快了训练速度。(矩阵运算与循环的差异可以参考这个回答:在机器学习的算法中, 矩阵运算的效率总是优于for循环吗?)

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length', 'sepal width',
                  'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100, [0, 1, -1]])
    for i in range(len(data)):
        if data[i, -1] == 0:
            data[i, -1] = -1
    return data[:, :2], data[:, -1]

class SVM:

    def __init__(self, data, max_iters=1000, kernal="linear", C=0.0001):
        self.max_iters = max_iters
        self.Kernal = kernal
        self.load_data(data[0], data[1])
        self.C = C

    def load_data(self, X, y):
        self.X = X
        self.Y = y

    def init_args(self):

        self.data_num = self.X.shape[0]
        self.dim = self.X.shape[1]
        self.alpha = np.zeros(self.data_num)
        self.b = 0
        self.f = [self.com_f(i) for i in range(self.data_num)]
        self.E = [self.com_E(i) for i in range(self.data_num)]

    def com_f(self, idx):
        
        return self.kernal(np.multiply(self.alpha, self.Y), np.dot(self.X[idx, :], self.X.T)) + self.b

    def com_E(self, idx):

        return self.f[idx] - self.Y[idx]

    def com_eta(self, i, j):

        return self.kernal(self.X[i], self.X[i]) + self.kernal(self.X[j], self.X[j]) - 2*self.kernal(self.X[i], self.X[j])

    def search_sample(self):
        
        l1 = [i for i in range(self.data_num) if 0 < self.alpha[i] < self.C]
        l2 = [i for i in range(self.data_num) if i not in l1]
        l1.extend(l2)

        for i in l1:
            if self.kkt(i):
                continue
            E = self.E[i]

            if E > 0:
                j = min(range(self.data_num), key=lambda x: self.E[x])
            else:
                j = max(range(self.data_num), key=lambda x: self.E[x])

            return i, j

    def kkt(self, idx):

        judge = self.Y[idx]*self.f[idx]
        if self.alpha[idx] == 0:
            return judge >= 1 
        elif self.alpha[idx] > 0 and self.alpha[idx] < self.C:
            return judge == 1
        else:
            return judge <= 1

    def compare(self, alpha_not_clip, L, H):

        if alpha_not_clip < L:
            return L
        elif alpha_not_clip > H:
            return H
        else:
            return alpha_not_clip

    def kernal(self, x_i, x_j):
        if self.Kernal == "linear":
            return np.dot(x_i, x_j)
        elif self.Kernal == "poly":
            return pow(np.dot(x_i, x_j)+1, 2)

    def train(self):

        for _ in range(self.max_iters):
            i, j = self.search_sample()
            print(i,",",j)
            eta = self.com_eta(i, j)
            # print(eta)
            if self.Y[i] == self.Y[j]:
                L = max(0, self.alpha[i]+self.alpha[j] - self.C)
                H = min(self.C, self.alpha[i]+self.alpha[j])
            else:
                L = max(0, self.alpha[j]-self.alpha[i])
                H = min(self.C, self.C + self.alpha[j]-self.alpha[i])

            E1 = self.E[i]
            E2 = self.E[j]


            if eta == 0:
                continue

            alpha2_new_unclip = self.alpha[j] + self.Y[j]*(E1 - E2) / eta
            alpha2 = self.compare(alpha2_new_unclip, L, H)
            alpha1 = self.alpha[i] + self.Y[i] * self.Y[j] * (self.alpha[j] - alpha2)
            
            b1_new = -E1 - self.Y[i] * self.kernal(self.X[i], self.X[i]) * (
                alpha1-self.alpha[i]) - self.Y[j] * self.kernal(self.X[j], self.X[i]) * (alpha2-self.alpha[j]) + self.b
            b2_new = -E2 - self.Y[i] * self.kernal(self.X[i], self.X[j]) * (
                alpha1-self.alpha[i]) - self.Y[j] * self.kernal(self.X[j], self.X[j]) * (alpha2-self.alpha[j]) + self.b

            if 0 < alpha1 < self.C:
                b_new = b1_new
            elif 0 < alpha2 < self.C:
                b_new = b2_new
            else:
                b_new = (b1_new + b2_new) / 2

            self.alpha[i] = alpha1
            self.alpha[j] = alpha2
            self.b = b_new
            self.E[i] = self.com_E(i)
            self.E[j] = self.com_E(j)

        print('train done!')

    def predict(self, x):
        
        r = self.kernal(np.multiply(self.alpha, self.Y), np.dot(x, self.X.T)) + self.b
        return 1 if r > 0 else -1

    def score(self, X_test, y_test):
        
        r = 0
        for i in range(len(X_test)):
            x = X_test[i]
            result = self.predict(x)
            if result == y_test[i]:
                r += 1
        return r / len(X_test)

    def weight(self):
        
        yx = self.Y.reshape(-1, 1)*self.X
        self.w = np.dot(yx.T, self.alpha)
        return self.w


def main():
    X, y = create_data()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
    plt.scatter(X[:50, 0], X[:50, 1], label='0')
    plt.scatter(X[50:, 0], X[50:, 1], label='1')
    svm = SVM((X_train, y_train))

    svm.init_args()

    svm.train()

    score = svm.score(X_test, y_test)

    print('score', score)

    a1, a2 = svm.weight()
    b = svm.b
    x_min = min(svm.X, key=lambda x: x[0])[0]
    x_max = max(svm.X, key=lambda x: x[0])[0]

    y1, y2 = (-b - a1 * x_min)/a2, (-b - a1 * x_max)/a2
    plt.plot([x_min, x_max], [y1, y2])
    plt.show()


if __name__ == "__main__":
    main()

0x05 结果

result
上一篇下一篇

猜你喜欢

热点阅读