Lecture 13 | (2/5) Recurrent Neu
2019-11-02 本文已影响0人
Ysgc
https://www.youtube.com/watch?v=jaw5W0bCgUQ
dont want NN to blow up -> require BIBO
what if we introduce nonlinear activation???
sigmoid quickly saturate.
tanh and relu blow up or shink to 0
relu similar to standard linear system
these curves reflect what the NN remember as time goes by.
what about the input is vector rather than scalar???
tanh remember things for a while, but eventually the information saturates
another problem with RNNthis is a problem for any deep NN
maximum gradients of sigmoid, tanh and relu are all = 1