pytorch problems

2020-10-02 本文已影响0人啊啊啊啊啊1231

1. nan loss

It's a natural property of stochastic gradient descent, if the learning rate is too large, SGD can diverge into infinity.

Solution: 1) reduce learning rate

2) normalization helped

3) The solution for me was using tf.losses.sparse_softmax_cross_entropy(y, logits) instead of my own implementation of Safe Softmax using tf.nn.Softmax

热点阅读

早餐里见世界
谏言：全国的扫黑反腐

08-22浅谈对“天津爆炸事故”的看法和感想
07-04元芳你怎么看下一句
07-03陪伴是最长情的告白下一句
01-21你知道fighting是什么意思？告诉你fighting的意思
06-23深度好文：生命的意义不单是幸福
06-20深度好文：人最怕深交后的陌生

pytorch problems

猜你喜欢

热点阅读