KL Divergence

2018-10-19 本文已影响0人力桓_7193

Entropy of distribution P is $H(P)=\sum_i{p_i\log{\frac{1}{p_i}}}$ , which reflects the amount of uncertainty in P. Uniform distributions always have the largest entropy.
If we do not have prior knowledge about P and guess it to be Q, then we actually add extra uncertainty and have cross entropy as $H(P, Q)=\sum_i{p_i\log{\frac{1}{q_i}}}$ . In another aspect, cross entropy itself is a good alternative to MSE loss with sigmoid function as demonstrated.
The discrepancy between $H(P)$ and $H(P, Q)$ is relative entropy, also known as KL divergence, formulated as $KL(P||Q)=H(P, Q)-H(P)=\sum_i{p_i\log{\frac{p_i}{q_i}}}$ .
KL divergence is non-negative proved by using Jensen’s Inequality. Besides, KL divergence is asymmetric ( $KL(P||Q)\ne KL(Q||P)$ ). However, we can define a symmetric variant as $KL'(P||Q)=(KL(P||Q)+KL(Q||P))/2$ . More properties can be referred here.

KL Divergence

猜你喜欢

热点阅读