Optimizing Neural Networks with
2019-01-09 本文已影响3人
初七123
Background and notation
Neural Networks
The Natural Gradient
Fisher information matrix
we will instead compute Fusing the training distribution ˆQx over inputs x
The well-known natural gradient (Amari, 1998) is defined as
the natural gradient defines the direction in parameter space which gives the largest change in the objective per unit of change in the model, as measured by the KL-divergence