Lecture 4 | The Backpropagation
2019-10-19 本文已影响0人
Ysgc
![](https://img.haomeiwen.com/i11683600/480aa2670a36eb35.png)
![](https://img.haomeiwen.com/i11683600/4c65db8fa7d68c0a.png)
![](https://img.haomeiwen.com/i11683600/200685cded36b2e1.png)
![](https://img.haomeiwen.com/i11683600/eab8244c4e551b97.png)
![](https://img.haomeiwen.com/i11683600/d85c09de3304ed8c.png)
vector activation vs scalar activation
![](https://img.haomeiwen.com/i11683600/8b821dab1c208b2a.png)
![](https://img.haomeiwen.com/i11683600/9e3d05396bd6a170.png)
![](https://img.haomeiwen.com/i11683600/1a8aec583ff5f8ac.png)
![](https://img.haomeiwen.com/i11683600/d2660708ab0c6c42.png)
![](https://img.haomeiwen.com/i11683600/bee1aa5feea47d76.png)
![](https://img.haomeiwen.com/i11683600/2c306549221314a0.png)
![](https://img.haomeiwen.com/i11683600/14756aa4d96bd11e.png)
sigmoid output -> prob of classification
![](https://img.haomeiwen.com/i11683600/7cf47f3f85f465cb.png)
![](https://img.haomeiwen.com/i11683600/482d7ba5f163fb10.png)
![](https://img.haomeiwen.com/i11683600/a022946e6f834c2e.png)
how to define the error???
![](https://img.haomeiwen.com/i11683600/577354d3401171c2.png)
first choice: square euclidean distance
L2 divergence -> differentiation is just
![](https://img.haomeiwen.com/i11683600/4353b053c5514498.png)
![](https://img.haomeiwen.com/i11683600/4f7969a7cbcad588.png)
gradient<0 => y_i should increase to reduce the div
![](https://img.haomeiwen.com/i11683600/47dd4d8d1e07ba5e.png)
arithmetically wrong, but label smoothing will help gradient descent!
avoid overshooting
https://leimao.github.io/blog/Label-Smoothing/
![](https://img.haomeiwen.com/i11683600/dd5e50b47aff9f5b.png)
it's a heuristic
![](https://img.haomeiwen.com/i11683600/823d728925b1c597.png)
![](https://img.haomeiwen.com/i11683600/5fc677df93929480.png)
![](https://img.haomeiwen.com/i11683600/f471ba0ab773e03a.png)
![](https://img.haomeiwen.com/i11683600/e21e5063358d4e73.png)
![](https://img.haomeiwen.com/i11683600/ae41d793be51f534.png)
![](https://img.haomeiwen.com/i11683600/b557101b6616d4bc.png)
![](https://img.haomeiwen.com/i11683600/23b9c8ca21d62acd.png)
![](https://img.haomeiwen.com/i11683600/1adea5babbe9c8bc.png)
![](https://img.haomeiwen.com/i11683600/f2fee6874eb4fcdd.png)
forward NN
![](https://img.haomeiwen.com/i11683600/285b742f6cb88d54.png)
![](https://img.haomeiwen.com/i11683600/fdb85f8a76dd2954.png)
backward NN
(1) trivial: grad of output
![](https://img.haomeiwen.com/i11683600/9e247d110f36db76.png)
(2) grad of the final activation layer
![](https://img.haomeiwen.com/i11683600/0bc24a6242507f4a.png)
(3) grad of the last group of weights
![](https://img.haomeiwen.com/i11683600/155930fa62932104.png)
![](https://img.haomeiwen.com/i11683600/9b1ec5fa799c1988.png)
(4) grad of the second last group of y
![](https://img.haomeiwen.com/i11683600/6e19a1826b1e406c.png)
(5) 综上 pseudocode & backward forward comparision
![](https://img.haomeiwen.com/i11683600/aa429325ea068ccd.png)
![](https://img.haomeiwen.com/i11683600/d47975cce7c3be94.png)
![](https://img.haomeiwen.com/i11683600/dad459bd500ab084.png)
![](https://img.haomeiwen.com/i11683600/7ba4aa19601051f4.png)
![](https://img.haomeiwen.com/i11683600/bd970e885be2dcd1.png)
![](https://img.haomeiwen.com/i11683600/9a45652566acd4a6.png)
![](https://img.haomeiwen.com/i11683600/855f676b50d2b39c.png)
![](https://img.haomeiwen.com/i11683600/1fb028182dff82a2.png)
![](https://img.haomeiwen.com/i11683600/2705e9027aeecee1.png)
![](https://img.haomeiwen.com/i11683600/c406b7e61b7600a6.png)
![](https://img.haomeiwen.com/i11683600/494acf58bf49ae1c.png)
![](https://img.haomeiwen.com/i11683600/2e733826f4874c9d.png)
![](https://img.haomeiwen.com/i11683600/cc4ca8f3eb6d049d.png)
![](https://img.haomeiwen.com/i11683600/4051aae72f3f48d5.png)