Recitation 3 | Deep Learning Opt
2019-10-25 本文已影响0人
Ysgc


grad in y axis decreasing

LR is the same for different param
to refine this process, Adagrad is introduced here


sparse data -> only a few params are frequently updated
automatically decaying LR -> pro or con?













Batch norm



