Logistics回归公式的推导

2019-05-14  本文已影响0人  若罗

首先根据最大似然估计找到目标 函数

P(y|x;\theta) = (h_\theta(x))^y(1-h_\theta(x))^{1-y}

似然函数为

L(\theta) = \prod^m_{i=1}P(y^{(i)}|x^{(i)};\theta)

= \prod^m_{i=1}(h_\theta(x^{(i)}))^{(y^{(i)})}(1-h_\theta(x^{(i)}))^{(1-y^{(i)})}

对数似然为

l(\theta)=\sum_{i=1}^m\Big(y^{(i)}\log h_\theta(x^{(i)}) + (1-y^{(i)})\log (1-h_\theta(x^{(i)}))\Big)

代价函数为

J(\theta) = -\frac{1}{m}l(\theta)

梯度下降,逐渐更新`$\theta$`

\theta_j := \theta_j -

\alpha\frac{\partial}{\partial\theta_j}J(\theta), (j = 0...n)

h(x) = \frac1{1+e^{-\theta^Tx}}

g(z) = \frac1{1+e^{-z}}

变量替换,即可由h(x)变换为g(x)

`$\alpha$`是学习速率,然后偏导即为梯度方向:

\frac{\partial}{\partial\theta}J(\theta) = -\frac{1}{m}\sum^m_{i=1}\Big(

y^{(i)}\frac1{h_\theta(x^{(i)})}

\frac{\partial}{\partial\theta_j}h_\theta(x^{(i)})

-(1-y^{(i)})\frac1{h_\theta(x^{(i)})}

\frac{\partial}{\partial\theta_j}h_\theta(x^{(i)})

\Big)
\\
=-\frac1m\sum^m_{i=1}\Big(

y^{(i)}\frac1{g(\theta^Tx^{(i)})}

-(1-y^{(i)})\frac1{1-g(\theta^Tx^{(i)})}

\Big)\frac\partial{\partial\theta_j}g(\theta^Tx^{(i)})
\\
=-\frac1m\sum^m_{i=1}\Big(

y^{(i)}\frac1{g(\theta^Tx^{(i)})}

-(1-y^{(i)})\frac1{1-g(\theta^Tx^{(i)})}

\Big)g(\theta^Tx^{(i)})(1-g(\theta^Tx^{(i)}))

\frac\partial{\partial\theta_j}\theta^Tx^{(i)}
\\
=-\frac1m\sum^m_{i=1}\Big(

y^{(i)}(1-g(\theta^Tx^{(i)}))

-(1-y^{(i)})g(\theta^Tx^{(i)})

\Big)x^{(i)}_j
\\
=-\frac1m\sum^m_{i=1}\Big(

y^{(i)}-g(\theta^Tx^{(i)})

\Big)x^{(i)}_j
\\
=-\frac1m\sum^m_{i=1}\Big(

y^{(i)}-h(x^{(i)})

\Big)x^{(i)}_j
\\
=\frac1m\sum^m_{i=1}\Big(

h(x^{(i)})-y^{(i)}

\Big)x^{(i)}_j

上述式子的第三步到第四步,是线性回归的偏导数求法,这里不作展开

上述式子的第二步到第三步,下面有详细推导。

其中最后一步的良好性质,正是我们选择sigmoid函数的原因

上一篇 下一篇

猜你喜欢

热点阅读