神经网络的数学推导及简单实现

2019-04-09  本文已影响0人  迎风漂扬

如图所示,先画一个简单的神经网络,输入是一个向量(x1,x2,x3,x4)
对于每一个所有的隐藏层和输出层的节点都有:
u_{j}=\sum_{i} w_{i j} x_{i}
x_{j}=f\left(u_{j}\right)
w_{i j}:不同层之间节点的连接权重
x_{i}:上一层节点的输出值
u_{j}:该节点的输入值
x_{j}:该节点的输出值
f:激活函数,这里采用ReLU函数

self.activative = lambda x: np.maximum(0, x)        #激活函
self.dirivative = lambda x: 1 if x > 0 else 0       #导数

例如对所有隐藏层节点5,6,7:
x_{5}=f\left(\sum_{i=1}^{4} w_{i 5} x_{i}\right)
x_{6}=f\left(\sum_{i=1}^{4} w_{i 6} x_{i}\right)
x_{7}=f\left(\sum_{i=1}^{4} w_{i 7} x_{i}\right)

为了方便,可以进一步将其转换为矩阵形式:
f\left(\left[ \begin{array}{cccc}{w_{15}} & {w_{25}} & {w_{35}} & {w_{45}} \\ {w_{16}} & {w_{26}} & {w_{36}} & {w_{46}} \\ {w_{17}} & {w_{27}} & {w_{37}} & {w_{47}}\end{array}\right] \left[ \begin{array}{c}{x_{1}} \\ {x_{2}} \\ {x_{3}} \\ {x_{4}}\end{array}\right]\right)=f\left(\left[ \begin{array}{c}{u_{5}} \\ {u_{6}} \\ {u_{7}}\end{array}\right]\right)=\left[ \begin{array}{c}{x_{5}} \\ {x_{6}} \\ {x_{7}}\end{array}\right]

self.u_hidden = np.dot(self.W_hidden, self.input)
self.x_hidden = self.activative(self.u_hidden)

接下来是输出层的计算:
f\left(\left[ \begin{array}{ccc}{w_{58}} & {w_{68}} & {w_{78}} \\ {w_{59}} & {w_{69}} & {w_{79}}\end{array}\right] \left[ \begin{array}{c}{x_{5}} \\ {x_{6}} \\ {x_{7}}\end{array}\right]\right)=f\left(\left[ \begin{array}{l}{u_{8}} \\ {u_{9}}\end{array}\right]\right)=\left[ \begin{array}{l}{x_{8}} \\ {x_{9}}\end{array}\right]=\left[ \begin{array}{l}{y_{1}} \\ {y_{2}}\end{array}\right]

self.u_output = np.dot(self.W_ouput, self.x_hidden)
self.y = self.activative(self.u_output)

以上就是正向的计算过程,接下来讨论误差的反向传播过程,首先来看损失函数:
E=\frac{1}{2} \sum_{i}\left(t_{i}-y_{i}\right)^{2}E是误差,t_{i}为训练数据的标记,y_{i}是输出

在训练的过程中,我们需要训练的也就是各层的权重, 根据梯度下降算法:
w_{j k}=w_{j k}-\eta \nabla w_{j k}
\eta:学习率
\nabla w_{j k}w_{j k}的梯度

首先来推导输出层的梯度,例如:
\nabla w_{58}=\frac{\partial E}{\partial u_{58}}=\frac{\partial E}{\partial u_{8}} \frac{\partial u_{8}}{\partial w_{58}}
因为变量w_{58}是通过影响节点8的输入值u_{8}进而再影响到误差E,接下来一个一个的求解分量:
\frac{\partial u_{8}}{\partial w_{58}}=\frac{\partial\left(x_{5} w_{58}+x_{6} w_{68}+x_{7} w_{78}\right)}{\partial w_{58}}=x_{5}
\frac{\partial E}{\partial u_{8}}=\frac{\partial E}{\partial x_{8}} \frac{\partial x_{8}}{\partial u_{8}}
\frac{\partial x_{8}}{\partial u_{8}}=f^{\prime}\left(u_{8}\right)
\frac{\partial E}{\partial x_{8}}=\frac{\partial E}{\partial y_{1}}=\frac{\partial \frac{1}{2} \sum_{i}\left(t_{i}-y_{i}\right)^{2}}{\partial y_{1}}=-\left(t_{1}-y_{1}\right)

综合上式,得到以下结果:
\nabla w_{58}=-\left(t_{1}-y_{1}\right) f^{\prime}\left(u_{8}\right) x_{5}

将下标处理一下,得到公式:
\nabla w_{j k}=-\left(t_{k}-y_{k}\right) f^{\prime}\left(u_{k}\right) x_{j},又设
\delta_{k}=\left(t_{k}-y_{k}\right) f^{\prime}\left(u_{k}\right)
\delta_{k}为该节点的误差项,所以最终的公式为:
w_{j k}=w_{j k}+\eta \delta_{k} x_{j}

用矩阵表示:
\left[ \begin{array}{c}{\delta_{8}} \\ {\delta_{9}}\end{array}\right]=\left[ \begin{array}{l}{\left(t_{1}-y_{1}\right) f^{\prime}\left(u_{8}\right)} \\ {\left(t_{2}-y_{2}\right) f^{\prime}\left(u_{9}\right)}\end{array}\right]

self.delta_output = (self.target - self.y) * self.dirivative(self.u_output)

\left[ \begin{array}{ccc}{-\nabla w_{58}} & {-\nabla w_{68}} & {-\nabla w_{78}} \\ {-\nabla w_{59}} & {-\nabla w_{69}} & {-\nabla w_{79}}\end{array}\right]=\left[ \begin{array}{c}{\delta_{8}} \\ {\delta_{9}}\end{array}\right] \left[ \begin{array}{ccc}{x_{5}} & {x_{6}} & {x_{7}}\end{array}\right]

self.nablaW_output = np.dot(self.delta_output, np.transpose(self.x_hidden))
self.W_ouput += self.learningrate * self.nablaW_output

以上,就是输出层的公式推导及简要的代码,对于隐藏层,例如误差E关于w_{15}的梯度的计算同样,只不过关于误差项有些区别:
\nabla w_{15}=\frac{\partial E}{\partial w_{15}}=\frac{\partial E}{\partial u_{5}} \frac{\partial u_{5}}{\partial w_{15}}
根据之前的讨论,同样另\frac{\partial E}{\partial u_{5}}=-\delta_{5},所以:
\frac{\partial u_{5}}{\partial w_{15}}=x_{1}
-\frac{\partial E}{\partial u_{5}}=\delta_{5}=-\frac{\partial E}{\partial x_{5}} \frac{\partial x_{5}}{\partial u_{5}}
\frac{\partial x_{5}}{\partial u_{5}}=f^{\prime}\left(u_{5}\right)

\frac{\partial E}{\partial x_{5}}的计算就稍微复杂一些,因为x_{5}的值对其下层的节点(8,9)都有影响,所以根据全导数公式:
\frac{\partial E}{\partial x_{5}}=\frac{\partial E}{\partial u_{8}} \frac{\partial u_{8}}{\partial x_{5}}+\frac{\partial E}{\partial u_{9}} \frac{\partial u_{9}}{\partial x_{5}}=\sum_{k=8}^{9} \frac{\partial E}{\partial u_{k}} \frac{\partial u_{k}}{\partial x_{5}}

\frac{\partial u_{k}}{\partial x_{5}}=\frac{\partial\left(\sum_{j} w_{j k} x_{j}\right)}{\partial x_{5}}=w_{5 k}
\frac{\partial E}{\partial x_{5}}==-\sum_{k=8}^{9} \delta_{k} w_{5 k}

综合上述公式,得到:
\delta_{5}=-\frac{\partial E}{\partial x_{5}} \frac{\partial x_{5}}{\partial u_{5}}=f^{\prime}\left(u_{5}\right) \sum_{k=8}^{9} \delta_{k} w_{5 k}
w_{15}=w_{15}-\eta \nabla w_{15}=w_{15}+\eta \delta_{5} x_{1}=w_{15}+f^{\prime}\left(u_{5}\right) x_{1} \sum_{k=8}^{9} \delta_{k} w_{5 k}
整理一下,最终得到公式:
\delta_{j}=f^{\prime}\left(u_{j}\right) \sum_{k} w_{j k} \delta_{k}
w_{i j}=w_{i j}+\eta \delta_{j} x_{i}

详细过程如下:
\left[ \begin{array}{ll}{\delta_{8}} & {\delta_{9}}\end{array}\right] \left[ \begin{array}{cc}{w_{58}} & {w_{68}} & {w_{78}} \\ {w_{59}} & {w_{69}} & {w_{79}}\end{array}\right]=\left[\sum_{k=8}^{9} \delta_{k} w_{5 k} \sum_{k=8}^{9} \delta_{k} w_{6 k} \sum_{k=8}^{9} \delta_{k} w_{7 k}\right]

temp = np.dot(np.transpose(self.delta_output), self.W_ouput)    #临时变量

\left[\sum_{k=8}^{9} \delta_{k} w_{5 k} \sum_{k=8}^{9} \delta_{k} w_{6 k} \sum_{k=8}^{9} \delta_{k} w_{7 k}\right] \times\left[f^{\prime}\left(u_{5}\right) \quad f^{\prime}\left(u_{6}\right) \quad f^{\prime}\left(u_{7}\right)\right]=\left[ \begin{array}{ccc}{\delta_{5}} & {\delta_{6}} & {\delta_{7}}\end{array}\right]

self.delta_hidden = self.dirivative(self.u_hidden) * np.transpose(temp)

\left[ \begin{array}{c}{\delta_{5}} \\ {\delta_{6}} \\ {\delta_{7}}\end{array}\right] \left[ \begin{array}{lll}{x_{1}} & {x_{2}} & {x_{3}} & {x_{4}}\end{array}\right]=\left[ \begin{array}{cccc}{-\nabla w_{15}} & {-\nabla w_{25}} & {-\nabla w_{35}} & {-\nabla w_{45}} \\ {-\nabla w_{16}} & {-\nabla w_{26}} & {-\nabla w_{36}} & {-\nabla w_{46}} \\ {-\nabla w_{17}} & {-\nabla w_{27}} & {-\nabla w_{37}} & {-\nabla w_{47}}\end{array}\right]

self.nablaW_hidden = np.dot(self.delta_hidden, np.transpose(self.input))
self.W_hidden += self.learningrate * self.nablaW_hidden

至此,隐藏层的权重也更新完毕

上一篇下一篇

猜你喜欢

热点阅读