something about softmax

2019-06-20 本文已影响0人捡个七

[1]. Softmax vs. Softmax-Loss: Numerical Stability

function softmax(z)
  #z = z - maximum(z)
  o = exp(z)
  return o / sum(o)
end
function gradient_together(z, y)
  o = softmax(z)
  o[y] -= 1.0
  return o
end
function gradient_separated(z, y)
  o = softmax(z)
  ∂o_∂z = diagm(o) - o*o'
  ∂f_∂o = zeros(size(o))
  ∂f_∂o[y] = -1.0 / o[y]
  return ∂o_∂z * ∂f_∂o
end

[2]. 反向传播之一：softmax函数

利用这个特性，在计算之前减去最大值防止溢出，但实际计算的结果并不会受到影响。

[3]. PyTorch - VGG output layer - no softmax?

The reason why this is done is because you only need the softmax layer at the time of inferencing. While training, to calculate the loss you don’t need to softmax and just calculate loss without it. This way the number of computations get reduced!

上一篇下一篇

猜你喜欢

热点阅读