最大似然求解k分类GDA模型

2019-03-26  本文已影响0人  deBroglie

建模

1)假设目标变量\small{y}服从\small{k}项式分布\small{(\phi_1, \cdots, \phi_{k-1})}\small {\begin{eqnarray} p(y; \phi_1, \cdots, \phi_{k-1}) = \phi_{0}^{ \mathbf{1} \{ y = 0 \} } \phi_{1}^{ \mathbf{1} \{ y = 1 \} } \cdots \phi_{k-1}^{ \mathbf{1} \{ y=k-1 \} } \\ = (1 - \sum_{j=1}^{k-1} \phi_{j})^{1 - \sum_{j=1}^{k-1} \mathbf{1} \{ y = j \} } \phi_{1}^{ \mathbf{1} \{ y = 1 \} } \cdots \phi_{k-1}^{ \mathbf{1} \{ y = k-1 \} } \end{eqnarray}} 其中\small{ \mathbf{1} \{ \cdot \} }指示函数,满足\small{ \mathbf{1} \{ \text{True} \} = 1, \mathbf{1} \{ \text{False} \} = 0 }
2)并假设特征向量对于不同类别均满足Gauss分布,并且有共同的协方差矩阵\small{\Sigma}\small{\vec{x}|y=j \ \sim N(\vec{\mu}_{j}, \Sigma), \quad j=0, 1, \cdots, k-1} \small{p(\vec{x}|y=j) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp{[-\frac{1}{2}(\vec{x}-\vec{\mu}_{j})^{T}\Sigma^{-1}(\vec{x}-\vec{\mu}_{j})]}, \quad j=0, 1, \cdots, k-1}
全部模型参数:\small{\phi_1,\cdots,\phi_{k-1},\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1},\Sigma}

对数似然函数

\small{\begin{eqnarray} & l( \phi_1,\cdots,\phi_{k-1},\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1},\Sigma ) \qquad\qquad\qquad\qquad\qquad \\ =& \log{\prod_{i=1}^{m} p(\vec{x}^{(i)}, y^{(i)}; \phi_1,\cdots,\phi_{k-1},\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1},\Sigma )} \qquad\qquad\qquad \\ =& \log{\prod_{i=1}^{m} p(\vec{x}^{(i)} | y^{(i)};\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1}, \Sigma ) p(y^{(i)}; \phi_1,\cdots,\phi_{k-1} )} \qquad\qquad \\ =& \sum_{i=1}^{m} \log{p(\vec{x}^{(i)} | y^{(i)};\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1}, \Sigma )} + \sum_{i=1}^{m} \log{p(y^{(i)}; \phi_1,\cdots,\phi_{k-1} )} \ \ \\ =& \sum_{i=1}^{m} \log{\prod_{j=0}^{k-1} p(\vec{x}^{(i)}|y^{(i)}=j; \vec{\mu}_{j}, \Sigma)^{\mathbf{1} \{ y^{(i)} = j \} }} + \sum_{i=1}^{m} \log{p(y^{(i)}; \phi_1,\cdots,\phi_{k-1} )} \\ =& \sum_{i=1}^{m} \sum_{j=0}^{k-1} {\mathbf{1} \{ y^{(i)} = j \} } \log{ p(\vec{x}^{(i)} | y^{(i)}=j; \vec{\mu}_{j}, \Sigma)} + \sum_{i=1}^{m} \log{p(y^{(i)}; \phi_1,\cdots,\phi_{k-1} )} \end{eqnarray}}

最大化对数似然函数

1)对\small{\phi_{j}, j=1,\cdots,k-1}求偏导:\small{\begin{eqnarray} & \frac{\partial l(\phi_1,\cdots,\phi_{k-1},\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1},\Sigma)}{\partial \phi_j} = \frac{\partial }{\partial \phi_j}\big(\sum_{i=1}^{m} \log{p(y^{(i)}; \phi_1,\cdots,\phi_{k-1} )}\big) \\ =& \frac{\partial }{\partial \phi_j} \big(\sum_{i=1}^{m} \log{[(1-\sum_{j'=1}^{k-1}\phi_{j'})^{1 - \sum_{j'=1}^{k-1} \mathbf{1} \{ y^{(i)} = {j'} \} } \phi_{1}^{\mathbf{1} \{ y^{(i)} = 1 \} } \cdots \phi_{k-1}^{\mathbf{1}\{y^{(i)}=k-1\} }]}\big) \qquad\quad \\ =& \frac{\partial }{\partial \phi_j} \big(\sum_{i=1}^m \big[ (1- \sum_{j'=1}^{k-1}\mathbf{1} \{ y^{(i)} = j' \}) \log{(1 - \sum_{j'=1}^{k-1}\phi_{j'})} + \sum_{j'=1}^{k-1} \mathbf{1} \{ y^{(i)} = j' \} \log{\phi_{j'}} \big] \big) \\ =& \sum_{i=1}^m \big[ (1- \sum_{j'=1}^{k-1} \mathbf{1} \{ y^{(i)} = j' \} )\cdot(1 - \sum_{j'=1}^{k-1}\phi_{j'})^{-1} \cdot (-1) + \mathbf{1} \{ y^{(i)} = j \}\cdot\phi_{j}^{-1} \big] \triangleq 0 \quad \\ & \Rightarrow\qquad (1 - \sum_{j'=1}^{k-1}\phi_{j'})^{-1} \cdot \sum_{i=1}^m (1- \sum_{j'=1}^{k-1} \mathbf{1} \{ y^{(i)} = j' \} ) = \phi_{j}^{-1} \cdot \sum_{i=1}^m \mathbf{1} \{ y^{(i)} = j \} \qquad \\ & \frac{\phi_{j}}{\sum_{i=1}^m \mathbf{1} \{ y^{(i)}=j \}} = \frac{(1 - \sum_{j'=1}^{k-1}\phi_{j'})}{\sum_{i=1}^m (1 - \sum_{j'=1}^{k-1} \mathbf{1} \{ y^{(i)} = j' \} )} = \frac{\phi_{0}}{\sum_{i=1}^m \mathbf{1} \{ y^{(i)}=0 \} } \\ & \Rightarrow \qquad\qquad\quad \phi_{j} = \frac{\phi_{0}}{\sum_{i=1}^m \mathbf{1} \{ y^{(i)} = 0 \} } \sum_{i=1}^m \mathbf{1} \{ y^{(i)}=j \} \qquad\qquad\qquad (\star) \qquad \end{eqnarray}}\small{j}求和,\small{(\star)}式左\small{=\sum_{j=0}^{k}\phi_{j}=1}\small{(\star)}式右\small{= \frac{\phi_{0}}{\sum_{i=1}^m \mathbf{1} \{ y^{(i)} = 0 \} }\sum_{j=0}^{k} \sum_{i=1}^{m} \mathbf{1} \{ y^{(i)}=j \} = m\cdot\frac{\phi_{0}}{\sum_{i=1}^m \mathbf{1} \{ y^{(i)}=0 \} }} 因此\small{\frac{\phi_{0}}{\sum_{i=1}^m \mathbf{1} \{ y^{(i)}=0 \} } = \frac{1}{m}},代入\small{(\star)}式得到\small{ \phi_{j} = \frac{1}{m} \sum_{i=1}^{m} \mathbf{1} \{ y^{(i)}=j \}, \ j=1,\cdots,k-1 } 结果对应:频率即概率。
2)对\small{\vec{\mu}_j, j=0,1,\cdots, k-1}求偏导:
\small{\begin{eqnarray} & \frac{\partial l(\phi_1,\cdots,\phi_{k-1},\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1},\Sigma)}{\partial \vec{\mu}_j} \qquad\qquad\qquad \qquad\qquad\qquad \\ =& \frac{\partial}{\partial \vec{\mu}_j} \big( \sum_{i=1}^{m} \mathbf{1} \{ y^{(i)} = j \} \log{p(\vec{x}^{(i)}|y^{(i)}=j; \vec{\mu}_{j}, \Sigma)} \big) \qquad\qquad\qquad \qquad \\ =& \quad \frac{\partial}{\partial \vec{\mu}_j} \big( \sum_{i=1}^{m} \mathbf{1} \{ y^{(i)} = j \} \big[ \log{\frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}} - \frac{1}{2}(\vec{x}^{(i)} - \vec{\mu}_j)^{T}\Sigma^{-1}(\vec{x}^{(i)} - \vec{\mu}_j) \big] \big) \\ =& \sum_{i=1}^{m} \mathbf{1} \{ y^{(i)} = j \} \big(2\cdot(-\frac{1}{2})\cdot\Sigma^{-1}(\vec{x}^{(i)} - \vec{\mu}_j)\cdot(-1) \big) \qquad\qquad\qquad \quad \\ =& \sum_{i=1}^{m} \mathbf{1} \{ y^{(i)} = j \}\Sigma^{-1}(\vec{x}^{(i)} - \vec{\mu}_j) \triangleq 0 \qquad\qquad\qquad \qquad\qquad\qquad \ \ \\ & \Rightarrow \quad \Sigma^{-1}\vec{\mu}_j = \frac{\sum_{i=1}^{m}\mathbf{1} \{ y^{(i)} = j \}\Sigma^{-1}\vec{x}^{(i)}}{\sum_{i=1}^{m}\mathbf{1} \{ y^{(i)} = j \} } \quad \Rightarrow \quad \vec{\mu}_j = \frac{\sum_{i=1}^{m}\mathbf{1} \{ y^{(i)} = j \} \vec{x}^{(i)}}{\sum_{i=1}^{m}\mathbf{1} \{ y^{(i)} = j \} } \end{eqnarray}} 其中分母表示\small{y^{(i)}}取值为\small{j}的样本数量,分子表示\small{y^{(i)}}取值为\small{j}的样本的\small{\vec{x}}之和,总体表示这一取值情况的\small{\vec{x}}的平均值,与预期结果相符。
3)对\small{\Sigma}求偏导:\small{\begin{eqnarray} & \frac{\partial l(\phi_1,\cdots,\phi_{k-1},\vec{\mu}_{0},\vec{\mu}_{1},\cdots,\vec{\mu}_{k-1},\Sigma)}{\partial \Sigma} \qquad\qquad\qquad \qquad\qquad \\ =& \frac{\partial}{\partial\Sigma} \big[\sum_{i=1}^{m} \sum_{j=0}^{k-1} \mathbf{1}\{ y^{(i)}=j\} \log{ p(\vec{x}^{(i)} | y^{(i)}=j; \vec{\mu}_{j}, \Sigma)}\big] \qquad\qquad\qquad \qquad\quad \\ =& \frac{\partial }{\partial \Sigma} \big[ \sum_{i=1}^{m} \sum_{j=0}^{k-1} \mathbf{1} \{ y^{(i)} = j \}\big( \log{\frac{1}{ (2\pi)^{\frac{n}{2}} |\Sigma|^{\frac{1}{2}} } - \frac{1}{2}(\vec{x}^{(i)} - \vec{\mu}_j)^{T}\Sigma^{-1}(\vec{x}^{(i)} - \vec{\mu}_j)} \big) \big] \\ =& -\frac{m}{2} \frac{\partial\log{|\Sigma|}}{\partial\Sigma} -\frac{1}{2} \frac{\partial}{\partial\Sigma} \sum_{i=1}^{m} \sum_{j=0}^{k-1} \mathbf{1} \{ y^{(i)} = j \} (\vec{x}^{(i)} - \vec{\mu}_j)^{T}\Sigma^{-1}(\vec{x}^{(i)} - \vec{\mu}_j) \qquad \\ =& -\frac{m}{2}|\Sigma|^{-1} \frac{\partial|\Sigma|}{\partial\Sigma} -\frac{1}{2} \sum_{i=1}^{m} \sum_{j=0}^{k-1} \big[ \mathbf{1} \{ y^{(i)} = j \} (\vec{x}^{(i)} - \vec{\mu}_j)(\vec{x}^{(i)} - \vec{\mu}_j)^{T} \big] \frac{\partial\Sigma^{-1}}{\partial\Sigma} \quad \\ =& -\frac{m}{2}\Sigma^{-1} + \frac{1}{2} \sum_{i=1}^{m} \sum_{j=0}^{k-1} \big[ \mathbf{1} \{ y^{(i)} = j \} (\vec{x}^{(i)} - \vec{\mu}_j)(\vec{x}^{(i)} - \vec{\mu}_j)^{T} \big]\Sigma^{-2} \triangleq 0 \qquad \ \\ \Rightarrow & \Sigma = \frac{1}{m}\sum_{i=1}^{m} \sum_{j=0}^{k-1} \big[ \mathbf{1} \{ y^{(i)} = j \} (\vec{x}^{(i)} - \vec{\mu}_j)(\vec{x}^{(i)} - \vec{\mu}_{j})^{T} \big] \end{eqnarray}} 可以发现这个结果也是符合预期的。

二分类是\small{k}分类的特例。

上一篇 下一篇

猜你喜欢

热点阅读