Optimizing Neural Networks with

2019-01-09  本文已影响3人  初七123

Background and notation

Neural Networks

The Natural Gradient

Fisher information matrix

we will instead compute Fusing the training distribution ˆQx over inputs x

The well-known natural gradient (Amari, 1998) is defined as

the natural gradient defines the direction in parameter space which gives the largest change in the objective per unit of change in the model, as measured by the KL-divergence

A block-wise Kronecker-factored Fisher approximation

Additional approximations to ̃F and inverse computations

Update damping

Pseudocode for K-FAC

上一篇 下一篇

猜你喜欢

热点阅读