pytorch problems

2020-10-02  本文已影响0人  啊啊啊啊啊1231

1. nan loss

It's a natural property of stochastic gradient descent, if the learning rate is too large, SGD can diverge into infinity.

Solution: 1) reduce learning rate

2) normalization helped

3) The solution for me was using tf.losses.sparse_softmax_cross_entropy(y, logits) instead of my own implementation of Safe Softmax using tf.nn.Softmax

上一篇下一篇

猜你喜欢

热点阅读