朴素贝叶斯法

2019-03-22  本文已影响0人  shenghaishxt

本文来自我的个人博客 https://www.zhangshenghai.com/posts/62831/

朴素贝叶斯的学习与分类

训练数据集
\begin{align*} \\& T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{align*}
P \left( X, Y \right)​独立同分布产生。其中,x_{i} \in \mathcal{X} \subseteq R^{n}, y_{i} \in \mathcal{Y} = \left\{ c_{1}, c_{2}, \cdots, c_{K} \right\}, i = 1, 2, \cdots, N​x_{i}​为第i​个特征向量(实例),y_{i}​x_{i}​的类标记,X​是定义在输入空间\mathcal{X}​上的随机向量,Y​是定义在输出空间\mathcal{Y}​上的随机变量。P \left( X, Y \right)​X​Y​的联合概率分布。

朴素贝叶斯法对条件概率分布作了条件独立性的假设,条件独立性假设是
\begin{align*} \\& P \left( X = x | Y = c_{k} \right) = P \left( X^{\left( 1 \right)} = x^{\left( 1 \right)} , \cdots, X^{\left( n \right)} = x^{\left( n \right)} | Y = c_{k}\right) \\ & \quad\quad\quad\quad\quad\quad = \prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right) \end{align*}

即,用于分类的特征在类确定的条件下都是条件独立的。

朴素贝叶斯法分类时,对给定的输入x,通过学习到的模型计算后验概率分布P(Y=c_k)|X=x,将后验概率最大的类作为x的类输出,后验概率计算根据贝叶斯定理进行:
\begin{align} \\& P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right) = P \left( Y = c_{k}| X = x \right) P \left( X = x \right) \\ & P \left( Y = c_{k}| X = x \right) = \dfrac{P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)}{P \left( X = x \right)} \\ & \quad\quad\quad\quad\quad\quad = \dfrac{P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)}{\sum_{Y} P \left( X = x, Y = c_{k} \right)} \\ & \quad\quad\quad\quad\quad\quad = \dfrac{P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)}{\sum_{Y} P \left(X = x | Y = c_{k} \right) P \left( Y = c_{k} \right)} \\ & \quad\quad\quad\quad\quad\quad = \dfrac{ P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)}{\sum_{Y} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)}\end{align}
朴素贝叶斯分类器可表示为
\begin{align*} \\& y = f \left( x \right) = \arg \max_{c_{k}} \dfrac{ P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)}{\sum_{Y} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)} \\ & \quad\quad\quad = \arg \max_{c_{k}} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right)\end{align*}

朴素贝叶斯算法的参数估计

极大似然估计

  1. 先验概率P \left( Y = c_{k} \right)的极大似然估计是
    \begin{align} \\& P \left( Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)}{N} \quad k = 1, 2, \cdots, K\end{align}

  2. 设第j个特征x^{\left( j \right)}可能取值的集合为\left\{ a_{j1}, a_{j2}, \cdots, a_{j S_{j}} \right\},条件概率P \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right)的极大似然估计是
    \begin{align} \\& P \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left(x_{i}^{\left( j \right)}=a_{jl}, y_{i} = c_{k} \right)}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)} \\ & j = 1, 2, \cdots, n;\quad l = 1, 2, \cdots, S_{j};\quad k = 1, 2, \cdots, K\end{align}
    其中,x_{i}^{\left( j \right)}是第i个样本的第j个特征;a_{jl}是第j个特征可能取的第l个值;I是指示函数。

朴素贝叶斯算法

输入:线性可分训练数据集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\}​,其中x_{i}= \left( x_{i}^{\left(1\right)},x_{i}^{\left(2\right)},\cdots, x_{i}^{\left(n\right)} \right)^{T}​x_{i}^{\left( j \right)}​是第i​个样本的第j​个特征,x_{i}^{\left( j \right)} \in \left\{ a_{j1}, a_{j2}, \cdots, a_{j S_{j}} \right\}​a_{jl}​是第j​个特征可能取的第l​个值,j = 1, 2, \cdots, n; l = 1, 2, \cdots, S_{j},y_{i} \in \left\{ c_{1}, c_{2}, \cdots, c_{K} \right\}​;实例x​

输出:实例x的分类

  1. 计算先验概率及条件概率
    \begin{align*} \\ & P \left( Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)}{N} \quad k = 1, 2, \cdots, K \\ & P \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left(x_{i}^{\left( j \right)}=a_{jl}, y_{i} = c_{k} \right)}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right)} \\ & j = 1, 2, \cdots, n;\quad l = 1, 2, \cdots, S_{j};\quad k = 1, 2, \cdots, K\end{align*}

  2. 对于给定的实例x=\left( x^{\left( 1 \right)}, x^{\left( 2 \right)}, \cdots, x^{\left( n \right)}\right)^{T},计算
    \begin{align*} \\ & P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right) \quad k=1,2,\cdots,K\end{align*}

  3. 确定实例x的类
    \begin{align} \\& y = f \left( x \right) = \arg \max_{c_{k}} P \left( Y = c_{k} \right)\prod_{j=1}^{n} P \left( X^{\left( j \right)} = x^{\left( j \right)} | Y = c_{k} \right) \end{align}

贝叶斯估计

用极大似然估计可能会出现所要估计的概率值为0的情况,这时会影响到后验概率的计算结果,解决这一问题的方法是采用贝叶斯估计。

  1. 先验概率的贝叶斯估计
    \begin{align*} \\& P \left( Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right) + \lambda}{N + K \lambda}\end{align*}

  2. 条件概率的贝叶斯估计
    \begin{align*} \\& P_{\lambda} \left( X^{\left( j \right)} = a_{jl} | Y = c_{k} \right) = \dfrac{\sum_{i=1}^{N} I \left(x_{i}^{\left( j \right)}=a_{jl}, y_{i} = c_{k} \right) + \lambda}{\sum_{i=1}^{N} I \left( y_{i} = c_{k} \right) + S_{j} \lambda} \end{align*}
    式中\lambda \geq 0​。当\lambda = 0​时,是极大似然估计;当\lambda = 1​时,称为拉普拉斯平滑。

上一篇 下一篇

猜你喜欢

热点阅读