概率分布,熵,指数,相关性分析,loss等

2021-05-25  本文已影响0人  第一个读书笔记

前提:
二维随机变量(x,y),所有可能的取值为(x_i,y_j),其中i = \lbrace{1,...,n \rbrace},j = \lbrace{1,...,m \rbrace},且\sum_i \sum_j p_{i,j}=1, p_{i,j} \geq0

概率分布

类型 表达式
联合概率分布 p(x,y) = p(x|y)p(y), p(x_i,y_j) = p(y_j|x_i)p(x_i) = p(x_i|y_j)p(y_j)
条件概率分布 p(x|y) = \frac {p(xy)}{p(y)}
全概率公式 p(x) =p(y_1)p(x|y_1) +p(y_2) p(x|y_2) + ... + p(y_m)p(x|y_m) = \sum_j^m p(y_j)p(x|y_j)
x和y条件独立 p(x,y) = p(x)p(y)
x和y有相关性 p(x,y) > p(x)p(y)

H(x) = E[I(x_i)] = -\sum_ip(x_i)log\ p(x_i)

条件熵:
\begin{align} H(y|x) &= \sum_ip(x_i)H(y|x_i) \\ &= -\sum_ip(x_i)\sum_jp(y_j|x_i)log\ p(y_j|x_i) \\ &= H(x,y)-H(x) \ \end{align}

联合熵:
\begin{align} H(x,y) & = -\sum_i\sum_jp(x_i,y_j)log\ p(x_i,y_j) \\ & = -\sum_i\sum_j p(x_i,y_j) log (\frac {p(y_j|x_i ) }{p(x_i)}) \\ & = -\sum_{i,j}p(x_i,y_j)logp(y_j|x_i) -\sum_{i,j}p(x_i,y_j)logp(x_i) \\ & = H(y|x) + H(x) \end{align}

交叉熵:
二分类:H(y,\hat y) =-\frac{1}{m}\sum_i^m(y_ilog(\hat y_i) + (1-y_i)log(1-\hat y_i))
多分类:H(y,\hat y) = -\frac{1}{m} \sum_i^m y_ilogp(\hat y_i)

Perplexity:
PP(p) = b^{H(p)} = b^{-\sum_i p(x_i)log_b p(x_i)}

各种指数

点互信息PMI, pointwise mutual info or point mutual info:
PMI(x,y) = PMI(y,x) = log \frac{p(x,y)}{p(x)p(y)} = log \frac {p(x|y)}{p(x)} = log \frac {p(y|x)}{p(y)}
PMI为0,则随机变量x和y之间相互独立。

互信息MI, mutual info:
I(x,y) = \sum_i\sum_jp(x_i,y_j)log \frac{p(x_i,y_j)}{p(x_i)p(y_j)} = \sum_i\sum_jp(x_i,y_j)PMI(x_i,y_j)
I(x,y) = H(x) - H(x|y) = H(y) - H(y|x) = H(x) + H(y) - H(x,y)

MI是非负的,为0时,则随机变量x和y之间相互独立。

信息增益Info Gain:
Gain(x) = H(x)-\sum_i\frac{|x_i|}{|x|}H(x_i)

基尼指数Gini Index:
Gini(x) = \sum_i\sum_{j,i!=j}p_ip_j
树模型分裂时: Gini(x) = \sum_i p_i(1-p_i)= 1-\sum_i p_i^2

KLD:
\begin{align} D_{KL} (p||q) &=\sum_xp(x)log(\frac{p(x)}{q(x)}) \\ & = \sum_xp(x)log p(x) - \sum_x p(x)log q(x) \\ & = H(p,q)-H(p) \\ \end{align}
2个分布的dissimilarity,越小越好。

JSD Jensen-Shannon(JS) Divergence:
JS(P||Q) = JS(Q||P)
JS(P||Q) = 1/2 KL(P||M) + 1/2 KL(Q||M)
M = 1/2 * (P+Q)

def kl_divergence(p, q):
#  same as scipy.special.rel_entr
    return np.sum(np.where(p[i]!=0, p[i]*log2(p[i]/q[i]),0) for p[i],q[i] in zip(p,q))

def js_divergence(p,q):
    m = 0.5*(p+q)
    return 0.5 * kl_divergence(p,m) + 0.5*kl_divergence(q,m)

Loss Function

MSE: 回归模型使用,对outlier敏感,收敛更快
L_{MSE} = \frac{1}{m}\sum_i^m(y_i-\hat y_i)^2

MAE: GBDT的损失函数,对outlier没那么敏感
L_{MAE} = \frac{1}{m}\sum_i^m |y_i-\hat y_i|

Hinge Loss (SVM)
L_{hinge} = max(0, 1-y*\hat y)

Huber Loss
L_{Huber-loss} = \ \begin{cases} \frac{1}{2}(y-\hat y)^2, \ if & |y-\hat y| <= \delta \\ \delta|y-\hat y|-\frac{1}{2} \delta ^2, &otherwise \end{cases}

上一篇 下一篇

猜你喜欢

热点阅读