Linear Classifier

2018-12-13  本文已影响0人  BigPeter

定义


如果输入的特征向量是实数向量{\displaystyle {\vec {x}}}则输出的分数为:

{\displaystyle y=f({\vec {w}}\cdot {\vec {x}})=f\left(\sum _{j}w_{j}x_{j}\right)}

其中,\vec {w}是一个权重向量,而f是一个函数,该函数可以通过预先定义的功能块,映射两个向量的点积,得到希望的输出。权重向量\vec {w}是从带标签的训练样本集合中所学到的。通常,"f"是个简单函数,会将超过一定阈值的值对应到第一类,其它的值对应到第二类。一个比较复杂的"f"则可能将某个东西归属于某一类。

对于一个二元分类问题,可以设想成是将一个线性分类利用超平面划分高维空间的情况: 在超平面一侧的所有点都被分类成"是",另一侧则分成"否"。

分类


有两种决定\vec{w}的线性分类模型:生成模型和判别模型。

生成模型:线性判别分析(LDA),朴素贝叶斯

判别模型:Logistic Regression, Perceptron, Support Vector Machine

使用核方法


【待】

参数学习/模型训练


Discriminative training of linear classifiers usually proceeds in a supervised way, by means of an optimization algorithm that is given a training set with desired outputs and a  that measures the discrepancy between the classifier's outputs and the desired outputs. Thus, the learning algorithm solves an optimization problem of the form

{\displaystyle {\underset {\mathbf {w} }{\arg \min }}\;R(\mathbf {w} )+C\sum _{i=1}^{N}L(y_{i},\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i})}

where

w is a vector of classifier parameters,

L(yiwTxi) is a loss function that measures the discrepancy between the classifier's prediction and the true output yi for the i'th training example,

R(w) is a regularization function that prevents the parameters from getting too large (causing overfitting), and

C is a scalar constant (set by the user of the learning algorithm) that controls the balance between the regularization and the loss function.

Popular loss functions include the hinge loss (for linear SVMs) and the log loss (for linear logistic regression). If the regularization function R is convex, then the above is a convex problem.Many algorithms exist for solving such problems; popular ones for linear classification include (stochastic) gradient descent, L-BFGS, coordinate descent and Newton methods.

上一篇 下一篇

猜你喜欢

热点阅读