自适应学习率调整算法

2017-06-17 本文已影响0人遥想yaoxiang

AdaGrad

独立调整模型所有参数的学习率，从训练过程的开始不断的减小learning rate
较大的梯度---rapid decrease 较小的梯度---relaticely small decrease

缺点是过度地降低了学习率,凸函数中性能更好

image

通过引入超参数alpha,控制量对历史梯度值的依赖程度
区别于AdaGrad将所有的梯度值叠加，RMSProp可避免训练过程中学习率过小

image

combine momentum with RMSProp

imag

The most straightforward way to add momentum to RMSProp is to apply momentum to the rescaled gradients

imag