Recitation 3 | Deep Learning Opt

2019-10-25  本文已影响0人  Ysgc

grad in y axis decreasing

LR is the same for different param

to refine this process, Adagrad is introduced here

sparse data -> only a few params are frequently updated
automatically decaying LR -> pro or con?

RMSprop = adadelta



Batch norm


上一篇 下一篇

猜你喜欢

热点阅读