深度学习 - 前向传播和反向传播

2019-09-18  本文已影响0人  云上听风

英文原文
深度学习---反向传播的具体案例
BP(反向传播算法)公式推导及例题解析

image.png

前向传播


input ->输入到h_{1}->net_{h1} ->输出sigmoid(net_{h1})->out_{h1}

out_{h1}->输入到o_{1}->net_{o1}->输出sigmoid(net_{o1})->out_{o1}

E_{o1} = \sum \frac{1}{2}(target-out_{o1})^{2}
总误差:E_{total} =E_{o1}+E_{o2}

用python写了个计算过程:

import math
import numpy as np

#前向传播
#[深度学习---反向传播的具体案例](https://zhuanlan.zhihu.com/p/23270674)
#流程:
#input(w1,w2)->net_h1->out_h1->out_h1(w5,w6)->net_o1->out_o1(o1)->e_o1

#第1层输入
i1 = .05 
i2 = .10
b1 = .35 #bias

#权重 weight
w1 = .15 
w2 = .20
w3 = .25
w4 = .30

#第2层:
net_h1 = w1*i1+w2*i2+b1*1
out_h1 = 1/(1+math.exp(-net_h1)) #out: sigmoid(x)

net_h2 = w3*i1+w4*i2+b1*1
out_h2 = 1/(1+math.exp(-net_h2))
print("net_h1:",net_h1)
print("out_h1:",out_h1)
print("net_h2:",net_h2)
print("out_h2:",out_h2)

b2 = .60 #bias
#权重
w5 = .40 #weight1
w6 = .45
w7 = .50
w8 = .55

#第3层:
net_o1 = w5*out_h1+w6*out_h2+b2
out_o1 = 1/(1+math.exp(-net_o1))

net_o2 = w7*out_h1+w8*out_h2+b2
out_o2 = 1/(1+math.exp(-net_o2))
print("net_o1:", net_o1)
print("out_o1:", out_o1)
print("net_o2:", net_o2)
print("out_o2:", out_o2)

#训练集正确输出
o1 = .01
o2 = 1 - o1

#误差
e_o1 = math.pow((o1-out_o1),2)/2
print("e_o1:",e_o1)
e_o2 = math.pow(o2-out_o2,2)/2
print("e_o2:",e_o2)
e_total = e_o1 + e_o2
print("e_total:",e_total)

打印结果:

net_h1: 0.3775
out_h1: 0.5932699921071872
net_h2: 0.39249999999999996
out_h2: 0.596884378259767
net_o1: 1.10590596705977
out_o1: 0.7513650695523157
net_o2: 1.2249214040964653
out_o2: 0.7729284653214625
e_o1: 0.274811083176155
e_o2: 0.023560025583847746
e_total: 0.2983711087600027

反向传播


反向传播是根据链式求导法则对参数(wb)进行更新。

一、对w权重更新

对于w_{5},想知道其改变对总误差有多少影响:
\frac{\partial E_{total}}{\partial w_{5}}=\frac{\partial E_{total}}{\partial out_{o1}}*\frac{\partial out_{o1}}{\partial net_{o1}}*\frac{\partial net_{o1}}{\partial w_{5}}

首先:
E_{total}=\frac{1}{2}(target_{o1}-out_{o1})^{2}+\frac{1}{2}(target_{o2}-out_{o2})^{2}

\frac{\partial E_{total}}{\partial out_{o1}}=2*\frac{1}{2}(target_{o1}-out_{o1})^{2-1}*-1+0
=-(target_{o1}-out_{o1})

这个是应用了求导连锁律。因为对out_{o1}求偏导,所以后半式子视为常数0。

然后:
out_{o1}=\frac{1}{1+e^{-net_{o1}}}
\frac{\partial out_{o1}}{\partial net_{o1}}=out_{o1}(1-out_{o1})
这一步是Sigmoid函数的求导。

资料参考:
Sigmoid函数的求导证明

好像有点复杂还不如我自己推导一下:
f(x)=\frac{1}{1+e^{-x}}=(1+e^{-x})^{-1}
f(x)^\prime=-(1+e^{-x})^{-2}*(-e^{-x})
这里应用了连锁率,(e^{-x})^\prime=-e^{-x}参考:

y=e^{-x}可以看做y=e^tt=-x的复合,根据复合函数求导的法则,先将yt求导得e^t,然后tx求导得-1,两个导数相乘,并将结果中t换成-x,从而(e^{-x})^\prime=e^{-x}*(-1)=-e^{-x}

整理一下:
=(1+e^{-x})^{-2}*e^{-x}=\frac{e^{-x}}{(1+e^{-x})^{2}}=\frac{(1+e^{-x})-1}{(1+e^{-x})^{2}}=\frac{(1+e^{-x})}{(1+e^{-x})^{2}}-\frac{1}{(1+e^{-x})^{2}}
=\frac{1}{(1+e^{-x})}-\frac{1}{(1+e^{-x})^{2}}
=\frac{1}{(1+e^{-x})}(1-\frac{1}{(1+e^{-x})})
所以:
\frac{\partial out_{o1}}{\partial net_{o1}}=out_{o1}(1-out_{o1})
最后:
net_{o1}=w_5*out_{h1}+w_6*out_{h2}+b_2*1
\frac{\partial net_{o1}}{\partial w_5}=(out_{h1}*w_5)^\prime+0+0=1*out_{h1}*w_5^{1-1}=out_{h1}
现在我们已经计算出所有偏导,可以计算:
\frac{\partial E_{total}}{\partial w_{5}}=\frac{\partial E_{total}}{\partial out_{o1}}*\frac{\partial out_{o1}}{\partial net_{o1}}*\frac{\partial net_{o1}}{\partial w_{5}}

为了减少误差,然后从当前的权重减去这个值(可选择乘以一个学习率,比如设置为0.5),得:
w_{5}^{+}=w_5-\eta *\frac{\partial E_{total}}{\partial w_{5}}
使用同样步骤计算出:w_{6}^{+}、w_{7}^{+}、w_{8}^{+}
隐藏层:
\frac{\partial E_{total}}{\partial w_{1}}=\frac{\partial E_{total}}{\partial out_{h1}}*\frac{\partial out_{h1}}{\partial net_{h1}}*\frac{\partial net_{h1}}{\partial w_{1}}
1. 分解之:
\frac{\partial E_{total}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{h1}}+\frac{\partial E_{o2}}{\partial out_{h1}}
\frac{\partial E_{total}}{\partial out_{h1}}类似可知:
\frac{\partial E_{o1}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial net_{o1}}*\frac{\partial net_{o1}}{\partial out_{h1}}
\frac{\partial E_{o1}}{\partial net_{o1}}=\frac{\partial E_{o1}}{\partial out_{o1}}*\frac{\partial out_{o1}}{\partial net_{o1}}
其中:
\frac{\partial E_{o1}}{\partial out_{o1}}的值在上面算\frac{\partial E_{total}}{\partial out_{o1}}值时已求出。
\frac{\partial out_{o1}}{\partial net_{o1}}的值也已经求出。
现在求:\frac{\partial net_{o1}}{\partial out_{h1}}
因为net_{o1}=w_5*out_{h1}+w_6*out_{h2}+b_2*1
所以:
\frac{\partial net_{o1}}{\partial out_{h1}}=w_5+0+0=w_5
现在我们可以求出\frac{\partial E_{o1}}{\partial out_{h1}}了。
同理可求出\frac{\partial E_{o2}}{\partial out_{h1}},然后得到\frac{\partial E_{total}}{\partial out_{h1}}的结果.
2. 求\frac{\partial out_{h1}}{\partial net_{h1}}:
因为out_{h1}=\frac{1}{1+e^{-net_{h1}}}
所以:\frac{\partial out_{h1}}{\partial net_{h1}}=out_{h1}*(1-out_{h1})
3. 求\frac{\partial net_{h1}}{\partial w_{1}}:
因为net_{h1}=w_1*i_1+w_2*i_2+b_1*1
所以:\frac{\partial net_{h1}}{\partial w_{1}}=i_1+0+0=i_1
4. 现在可以更新w_1
w_{1}^{+}=w_1-\eta *\frac{\partial E_{total}}{\partial w_{1}}
5. 使用同样步骤计算出:w_{2}^{+}、w_{3}^{+}、w_{4}^{+}


现在,我们更新了所有的w权重,经过多次迭代反向传播之后,错误倾斜率越来越小,也就是跟正确结果越来越接近。

二、对b偏置更新

1. 更新b2
\frac{\partial E_{total}}{\partial b_{2}}=\frac{\partial E_{total}}{\partial out_{o1}}*\frac{\partial out_{o1}}{\partial net_{o1}}*\frac{\partial net_{o1}}{\partial b_{2}}

其中\frac{\partial E_{total}}{\partial out_{o1}}和\frac{\partial out_{o1}}{\partial net_{o1}}在上面更新w5时已经求得。
\frac{\partial net_{o1}}{\partial b_{2}}=0+0+(b2*1)^\prime=1

此时可求:b_{2}^{+}=b_2-\eta *\frac{\partial E_{total}}{\partial b_{2}}

2. 更新b1
\frac{\partial E_{total}}{\partial b_{1}}=\frac{\partial E_{total}}{\partial out_{h1}}*\frac{\partial out_{h1}}{\partial b_1}=\frac{\partial E_{total}}{\partial out_{h1}}*\frac{\partial out_{h1}}{\partial net_{h1}}*\frac{\partial net_{h1}}{\partial b_{1}}

其中\frac{\partial E_{total}}{\partial out_{h1}}和\frac{\partial out_{h1}}{\partial net_{h1}}在上面更新w1时已经求得。
\frac{\partial net_{h1}}{\partial b_{1}}=0+0+(b1*1)^\prime=1
此时可求:b_{1}^{+}=b_1-\eta *\frac{\partial E_{total}}{\partial b_{1}}

3. 同理可对其他w对应的b更新,也就是说所有的w使用独立的不同的b

三、根据上面推导可以轻松写出python程序
上一篇下一篇

猜你喜欢

热点阅读