Lecture 13 | (2/5) Recurrent Neu

2019-11-02  本文已影响0人  Ysgc

https://www.youtube.com/watch?v=jaw5W0bCgUQ

dont want NN to blow up -> require BIBO

what if we introduce nonlinear activation???

sigmoid quickly saturate.
tanh and relu blow up or shink to 0
relu similar to standard linear system

these curves reflect what the NN remember as time goes by.

what about the input is vector rather than scalar???

tanh remember things for a while, but eventually the information saturates

another problem with RNN

this is a problem for any deep NN

maximum gradients of sigmoid, tanh and relu are all = 1

上一篇下一篇

猜你喜欢

热点阅读