Optimizing Neural Networks with

2019-01-09 本文已影响3人初七123

Background and notation

Neural Networks

The Natural Gradient

Fisher information matrix

we will instead compute Fusing the training distribution ˆQx over inputs x

The well-known natural gradient (Amari, 1998) is defined as

the natural gradient defines the direction in parameter space which gives the largest change in the objective per unit of change in the model, as measured by the KL-divergence

Optimizing Neural Networks with

Background and notation

A block-wise Kronecker-factored Fisher approximation

Additional approximations to ̃F and inverse computations

Update damping

Pseudocode for K-FAC

猜你喜欢

热点阅读