KL Divergence

2018-10-19  本文已影响0人  力桓_7193

Entropy of distribution P is H(P)=\sum_i{p_i\log{\frac{1}{p_i}}}, which reflects the amount of uncertainty in P. Uniform distributions always have the largest entropy.
If we do not have prior knowledge about P and guess it to be Q, then we actually add extra uncertainty and have cross entropy as H(P, Q)=\sum_i{p_i\log{\frac{1}{q_i}}}. In another aspect, cross entropy itself is a good alternative to MSE loss with sigmoid function as demonstrated.
The discrepancy between H(P) and H(P, Q) is relative entropy, also known as KL divergence, formulated as KL(P||Q)=H(P, Q)-H(P)=\sum_i{p_i\log{\frac{p_i}{q_i}}}.
KL divergence is non-negative proved by using Jensen’s Inequality. Besides, KL divergence is asymmetric (KL(P||Q)\ne KL(Q||P)). However, we can define a symmetric variant as KL'(P||Q)=(KL(P||Q)+KL(Q||P))/2. More properties can be referred here.

上一篇下一篇

猜你喜欢

热点阅读