Logistic Regression 求解

2019-12-30  本文已影响0人  Allen_9820

基本概念

E(y|x)=p(y=1|x)

log(\frac{p}{1-p})=\boldsymbol{X}\boldsymbol{\beta}

P(Y=1|X)=\hat{p}=\frac{exp(\boldsymbol{X}\hat{\boldsymbol{\beta})}}{1+exp(\boldsymbol{X}\hat{\boldsymbol{\beta})}}

\beta用MLE估计,但没有显式解,所以需要用一种名为IRLS的方法求解。

求解步骤

首先写出似然函数
L(\beta)=\prod_{i=1}^nf(x_i;\beta)
f(x_i;\beta)为第i个样本的密度函数,有f(x_i;\beta)=p(x_i)^{y_i}(1-p(x_i))^{1-y_i}, 因此
\begin{align}L(\beta)&=\prod_{i=1}^np(x_i)^{y_i}(1-p(x_i))^{1-y_i}\\l(\beta)&=\log L(\beta)=\sum_{i=1}^n\{y_i\log{p(x_i;\beta)}+(1-y_i)\log (1-p(x_i;\beta))\}\\&=\sum_{i=1}^n\left[y_i(x_i^T\beta-\log(1-e^{x_i^T\beta}))+(1-y_i)(-\log(1-e^{x_i^T\beta}))\right]\\&=\sum_{i=1}^n\left[y_ix_i^T\beta-\log(1+e^{x_i^T\beta})\right]\end{align}
所以,对l求偏导
\begin{align}\frac{\partial l(\beta)}{\partial \beta}&=\sum_{i=1}^n\left[y_i\frac{\partial x_i^T\beta}{\partial \beta}-\frac{1}{1+e^{x_i^T\beta}}\frac{\partial e^{x_i^T\beta}}{\partial \beta}\right]\\&=\sum_{i=1}^n\left(y_ix_i-\frac{e^{x_i^T\beta}}{1+e^{x_i^T\beta}}\cdot x_i\right)\\&=\sum_{i=1}^nx_i\left(y_i-\frac{e^{x_i^T\beta}}{1+e^{x_i^T\beta}}\right)\\&=\sum_{i=1}^nx_i\left(y_i-p(x_i;\beta)\right)=0 \tag{1}\end{align}
偏导数等于0,但没有显式解,只能借助数值方法,用Newton-Raphson算法解决,要先求出二阶导(Hessian Matrix)
\begin{align}\frac{\partial}{\partial \beta^T}x_iy_i=0;\\\frac{\partial}{\partial \beta^T}\frac{x_ie^{x_i^T\beta}}{1+e^{x_i^T\beta}}&=\frac{x_ie^{x_i^T\beta}\cdot x_i^T(1+e^{x_i^T\beta})-x_ie^{x_i^T\beta}\cdot e^{x_i^T\beta} x_i^T}{(1+e^{x_i^T\beta})^2}\\&=\frac{x_ie^{x_i^T\beta} x_i^T}{(1+e^{x_i^T\beta})^2}=x_ix_i^T\frac{e^{x_i^T\beta}}{(1+e^{x_i^T\beta})^2}\\&=x_ix_i^T\frac{e^{x_i^T\beta}}{1+e^{x_i^T\beta}}\frac{1}{1+e^{x_i^T\beta}}\\&=x_ix_i^Tp(x_i;\beta)[1-p(x_i;\beta)]\\\frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T}&=-\sum_{i=1}^nx_ix_i^Tp(x_i;\beta)[1-p(x_i;\beta)] \tag{2}\end{align}
迭代:
\beta^{new}=\beta^{old}-\left(\frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T}\right)^{-1}\frac{\partial l(\beta)}{\partial \beta}\tag{3}

矩阵形式(IRLS)

(1)式写成矩阵形式:
\frac{\partial l(\beta)}{\partial \beta}=\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p})\tag{4}
其中\boldsymbol{X}n\times(p+1)矩阵,\boldsymbol{y}\boldsymbol{p}n维向量。

同样把(2),(3)式写成矩阵形式
\begin{align}\frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T}&=-\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}\tag{5}\\\beta^{new}&=\beta^{old}-\left(\frac{\partial^2l(\beta)}{\partial\beta\partial\beta^T}\right)^{-1}\frac{\partial l(\beta)}{\partial \beta}\\&=\beta^{old}+(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X})^{-1}\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p})\\&=(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{W}\left(\boldsymbol{X}\beta^{old}+\boldsymbol{W}^{-1}(\boldsymbol{y}-\boldsymbol{p})\right)\\&=(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{z}\tag{6}\end{align}
其中\boldsymbol{W}n\times n, 对角线元素为p(x_i;\beta^{old})[1-p(x_i;\beta^{old})],i=1,\ldots,n的对角矩阵,\boldsymbol{z}=\boldsymbol{X}\beta^{old}+\boldsymbol{W}^{-1}(\boldsymbol{y}-\boldsymbol{p}).

(6)式的形式非常像Weighted Least Squared的解,因此该方法被称作iteratively reweighted least squares (IRLS). 每一轮迭代都可以看作是在解一个加权最小二乘问题。

上一篇下一篇

猜你喜欢

热点阅读