Lecture 10 | Recurrent Neural Ne

2019-11-04  本文已影响0人  Ysgc

2014, before batch normalization was invented, training NN was hard.

For example, VGG was trained for 11 layers first, and then randomly added more layers inside, so that it could converge.

Another example: Google net used early output

bad! only bp within a batch

similar to mini batch

learned to recite the GNU license

675 Mass Ave -> central square ???

not perfect

soft attention -> weighted combination of all img location
hard attention -> forcing the model select only one location to look at -> more tricky -> not differentiable -> talk later in RL lecture

RNN typically not deep -> 2,3,4 layers usually
上一篇下一篇

猜你喜欢

热点阅读