pytorch 损失函数

2020-02-13 本文已影响0人一杭oneline

pytorch 权值初始化与损失函数

梯度爆炸和梯度消失

为什么会产生以上问题

$E(X*Y) = E(X) * E(Y)$

$D(X) = E(X^2)-[E(X)]^2$

$D(X+Y) = D(X)+D(Y)$

$D(X*Y) = D(X)*D(Y)+D(X)*[E(Y)]^2+D(Y)*[E(X)]^2$

若 $E(X)=0$ , $E(Y)=0$

$D(X*Y) = D(X)*D(Y)$

$H_{11} = \sum_{i=0}^{n}X_i*W_{1i}$

$D(H_{11}) =\sum_{i=0}^{n}D(X_i)*D(W_{1i})= n*1*1$

每一次传递就会变为原来的N倍，batch_size的大小，梯度爆炸，要避免这个问题

$D(H_1) = n*D(X)*D(W)=1$

$D(W)=1/N$ $STD = \sqrt{1/N}$

增加激活函数后，权重越来越小，会出现梯度消失的问题

方差一致性：保持数据尺度维持在恰当范围，通常方差为1

饱和函数，如sigmod ，Tanh

$n_i*D(W)=1$ $n_{i+1}*D(W) = 1$ $N_i是输入层神经元个数，N_{i+1}是输出层神经元个数$

$D(W)=2/(n_i+n_{i+1})$

通常采用均匀分布， $W$ ~ $U[-a,a]$

$D(W) = \frac{(-a-a)^2}{12}=\frac{a^2}{3}$

得出 $a = -\frac{\sqrt{6}}{\sqrt{n_i+n_{i+1}}}$

对于非饱和函数 , $ReLU$ 等变种

$D(W) = \frac{2}{n_i}$

$ReLU$ 变种，在负半轴有斜率 $a$ 的

$D(W) = \frac{2}{(1+a^2)*n_i}$

tanh_gain = nn.init.calculate_gain('tanh')

nn.init.xavier_uniform_(m.weight.data, gain=tanh_gain)
# 以上方法只适应于饱和激活函数，并不适合reLU

# relu 等变种激活函数
nn.init.kaiming_normal_(m.weight.data)

损失函数

$Loss = f(\hat{y},y)$ 单个样本叫损失，一般用这个

$Cost = \frac{1}{N}\sum_i^N{f(\hat{y_i},y_i)}$ 计算整个样本集损失平均值

$Obj = Cost + Regularization$ 目标函数，正则项

pytorch中的损失继承nn.Module 主要是redaction none :每个神经元进行操作，sum average

1.nn.CrossEntropyLoss()

交叉熵损失函数

这个是nn.LogSoftmax() nn.NLLLoss()两个函数计算的

衡量两个概率分布的差异，信息熵，相对熵

交叉熵 = 信息熵 + 相对熵（熵表示一个信息的不确定性）

$I(x)=-log(P(X))$ 自信息

熵： $H(P) = E_{x-p}[I(x)] = -\sum_{i}^N P(x_i)logP(x_i)$

相对熵： $D_{KL}(P,Q) = E_{x-p}[log\frac{P(x)}{Q(x)}] = H(P,Q)-H(P)$ $KL$ 散度

交叉熵： $H(P,Q)=-\sum_{i=1}^{N}P(x_i)logQ(x_i) = D_{KL}(P,Q)+H(P)$

所以 $KL$ 越小越好

inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# 同target = torch.tensor([[1,0],[0,1],[0,1]], dtype=torch.float)
# 样本3个 标签0 标签从0开始 其实每一个标签都是和样本维度一样输入[x1,x2,x3,x4] 标签[0,1,0,0] 这就是标签是1
# loss函数作为神经网络的最后一层算是，举得例子就是类似最后一层的输出和真实标签的差距，所以target是long类型的，
# 类似与索引


weights = torch.tensor([1, 2], dtype=torch.float) # 负样本权重1 正样本权重2  label1 的样本权重1 label2 的样本权重2# weights = torch.tensor([0.7, 0.3], dtype=torch.float)
loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean') 
#forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
#'none':就是每个神经元都进行一对一计算
#'sum':计算出来每个神经元进行相加
#'average':求平均，如果设置了weight，相当于把相应的样本进行拷贝，此例子中就是label1是1个，label2相应的样本增多两个，在计算weight时，把weight加起来

# 输出
tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

2.nn.NLLLoss()

取反损失函数

target位置取反

inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')
    # forward
loss_none_w = loss_f_none_w(inputs, target)
#输出：
tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

3.nn.BCELoss()

二分类交叉熵函数
$l_n=-w_n[y_n·logx_n+(1-y_n)·log(1-x_n)]$
输入的target与为 $torch.float$ 型

输入样本的各个属性值必须在[0,1]，需要使用 sigmod()进行转换

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
# 每个神经元一一对应计算loss
target_bce = target
# itarget
inputs = torch.sigmoid(inputs)
weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')    
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')
#forward 同上
# 输出：
BCE Loss tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) 
tensor(11.7856) tensor(1.4732)

4.nn.BCEWithLogitsLoss()

结合Sigmod与二分类交叉熵

网络最后不能加sigmod函数，自己带有sigmod的功能
$l_n=-w_n[y_n·log\delta(x_n)+(1-y_n)·log(1-\delta(x_n)]$

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

    # itarget
    # inputs = torch.sigmoid(inputs)
    
weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([3], dtype=torch.float)        # 3 正样本*3
# 参数：pos_weight:正样本[0,1]类权值  weight 各类别loss设置权重
loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)

    # forward
loss_none_w = loss_f_none_w(inputs, target_bce)
#输出
weights:  tensor([1., 1.])
BCE Loss tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

pos_weights:  tensor([3.])
tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

5.nn.L1Loss ()

计算inputs和target之差的绝对值
$l_n=|x_n-y_n|$

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3
loss_f = nn.L1Loss(reduction='none')
loss = loss_f(inputs, target)
#输出：
input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1 loss:tensor([[2., 2.],
        [2., 2.]])

6.nn.MSELoss()

计算inputs与target之差的平方 $l_n=(x_n-y_n)^2$

reduction:计算模式，可为none/sum/mean

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3
loss_f_mse = nn.MSELoss(reduction='none')
loss_mse = loss_f_mse(inputs, target)
MSE loss:tensor([[4., 4.],[4., 4.]])

7.nn.SmoothL1Loss()

平滑的L1Loss，
$loss(x,y)=\frac{1}{n}\sum_iz_i \\ z_i=[if |x_i-y_i|<1 :0.5(x_i-y_i)^2,else:|x_i-y_i|-0.5]$

在底端更加平滑

inputs = torch.linspace(-3, 3, steps=500)
target = torch.zeros_like(inputs)
loss_f = nn.SmoothL1Loss(reduction='none')
loss_smooth = loss_f(inputs, target)
loss_l1 = np.abs(inputs.numpy())

8.nn.PoissonNLLLoss()

泊松分布（二项分布）的负数对数似然损失函数

log_input：输入是否为对数形式，决定计算公式
$log\_input=true:loss(input,target) = exp(input)-target*input; \\ log\_input=false:input-target*log(input-eps)$
full：计算所有loss，默认false

eps：修正项，避免 $NaN$

inputs = torch.randn((2, 2))
target = torch.randn((2, 2))
loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))
#输出：
input:tensor([[0.6614, 0.2669],[0.0617, 0.6213]])
target:tensor([[-0.4519, -0.1661],[-1.5228,  0.3817]])
Poisson NLL loss:tensor([[2.2363, 1.3503],[1.1575, 1.6242]])

idx = 0
loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]*inputs[idx, idx]

9.nn.KLDivLoss()

KL散度，相对熵，计算两个分布的相似度
$D_{KL}(P||Q) = E_{x~p}[log{\frac{P(x)}{Q(x)}}]=E_{x~p}[logP(x)-logQ(x)]=\sum_{i=1}^nP(x_i)(logP(x_i)-logQ(x_i))$

此函数中计算
$l_n = y_n·(log{y_n}-x_n)$
这个时候的 $y_n$ 已经是概率分布了（目标概率分布），就是 $y$ 是第一类的概率为0.9 第二类0.0. 第三类0.05，[0.9,0.05,0.05]， $x_n$ 是经过多层神经元输出的一个概率分布[0.8,0.1,0.1]

上公式解释：对一个样本计算 $\sum$ 去掉， $x_n$ 需要先计算一个 $log-probabilities$ ，或使用nn.logsoftmax()，处理多分类时的概率分布

batchmean：batchsize维度求平均值

inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])   ##这是个分布
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_bs_mean(inputs, target)

print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))

#输出：
loss_none:tensor([[-0.5448, -0.1648, -0.1598],[-0.2503, -0.4597, -0.4219]])
  warnings.warn("reduction: 'mean' divides the total loss by both the batch size and the support size."
loss_mean:-0.3335360586643219
loss_bs_mean:-1.000608205795288

idx = 0
loss_1 = target[idx, idx] * (torch.log(target[idx, idx]) - inputs[idx, idx])

10.nn.MarginRankingLoss()

两个N维向量之间的相似度，用于排序任务，该方法计算两组数据之间的差异，返回一个 $N*N$ 的Loss矩阵
$loss(x,y)=max(0 ,-y*(x1-x2)+margin)$
y=1，希望x1比x2大，当x1>x2时，不产生loss

y=-1，希望x1比x2小，当x2>x1时，不产生loss

margin ：边界值

reduction：计算模式

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)  #这是y
loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')
loss = loss_f_none(x1, x2, target)
# y=-1 x1[2]=3  3-2 3-2 3-2 1 1 -1 --->0 0 1
#输出：
loss:tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])

11.nn.MultiLabelMarginLoss()

多标签边界损失函数，多标签：一个样本有多个标签，比如一张图片对应多个类别

举例：四分类任务，样本x属于0类和3类，标签 $[0,3,-1,-1]$ ，不是 $[1,0,0,1]$
$loss(x,y)=\sum_{ij}{\frac{max(0,1-(x[y[j]]-x[i]))}{x.size(0)}}$
其中 $y:label,i=0到x.size(0);j=0到y.size(0),y[j]≥0,and i≠y[j] \quad for \quad all \quad i\quad and \quad j$ ，

公式含义是标签所在神经元减去不是标签所在的神经元，

只有标签所在神经元大于不是标签所在的神经元的值小于1时，两类差越大越好，才有意义

x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long) #样本属于第0类和第3类，数据为long类型
loss_f = nn.MultiLabelMarginLoss(reduction='none')
loss = loss_f(x, y)
#输出：
tensor([0.8500])

# 计算步骤
x = x[0]
item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))    # [0]
item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))    # [3]
loss_h = (item_1 + item_2) / x.shape[0]

12.nn.SoftMarginLoss()

二分类logistic损失
$loss(x,y)=\sum_i\frac{log(1+exp(-y[i]*x[i]))}{x.nelement()}$
$x.nelement()$ 是平均值， $y=1或-1$

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

loss_f = nn.SoftMarginLoss(reduction='none')
loss = loss_f(inputs, target)
#输出：
SoftMargin:  tensor([[0.8544, 0.4032],[0.4741, 0.9741]])

idx = 0
inputs_i = inputs[idx, idx]
target_i = target[idx, idx]

loss_h = np.log(1 + np.exp(-target_i * inputs_i))
#输出：tensor(0.8544)

13.nn.MultiLabelSoftMarginLoss()

$softmarginloss$ 的多标签版本
$loss(x,y)=-\frac{1}{C}*\sum_iy[i]*log(\frac{1}{(1+exp(-x[i]))})+(1-y[i])*log(\frac{exp(-x[i])}{1+exp(-x[i])})$
$C$ 是类别数量

假设4分类， $y$ 取值 $[1,0,0,1]$ ，属于第1类和第4类，这个和多标签边界损失函数标签不一样

inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float) #标签是float型
loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
loss = loss_f(inputs, target)
#输出：MultiLabel SoftMargin:  tensor([0.5429])

#手算
i_0 = torch.log(torch.exp(-inputs[0, 0]) / (1 + torch.exp(-inputs[0, 0])))
i_1 = torch.log(1 / (1 + torch.exp(-inputs[0, 1])))
i_2 = torch.log(1 / (1 + torch.exp(-inputs[0, 2])))
loss_h = (i_0 + i_1 + i_2) / -3

14.nn.MultiMarginLoss()

计算多分类的折页损失
$loss(x,y)=\frac{\sum_imax(0,margin-(x[y]-x[i]))^p}{x.size(0)}$
$where\quad x∈\{0,…,x.size(0)-1\},y∈\{0,...,y.size(0)-1\},0≤y[i]≤x.size(0)-1,and\quad i≠y[j]forall i and j$

y取值为 $torch.long$ 类型， $[1,2,1]$ 表示第1个样本为第1类，第2个样本为第2类，第3个样本为第1类

标签值减去非标签的值， $i$ 不能等于标签所在项

主要参数：weight 各类别的loss设置权重，margin边界值，默认1，p:可选1或者2，默认1

x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
y = torch.tensor([1, 2], dtype=torch.long) #label类型为long

loss_f = nn.MultiMarginLoss(reduction='none')
loss = loss_f(x, y)
#输出：Multi Margin Loss:  tensor([0.8000, 0.7000])

 x = x[0]
 margin = 1

 i_0 = margin - (x[1] - x[0])
 #i_1 = margin - (x[1] - x[1]) #0
 i_2 = margin - (x[1] - x[2])
 loss_h = (i_0 + i_2) / x.shape[0]

 print(loss_h) #tensor(0.8000)

15.nn.TripletMarginLoss()

三元组损失，人脸识别中常用
$L(a,p,n)=max\{d(a_i,p_i)-d(a_i,n_i)+margin,0\} \quad d(x_i,y_i)=||x_i-y_i||_p\quad p范数$
计算点与点之间的距离， $anchor-pos$ 之间的距离要比 $anchor-neg$ 之间的距离小，anchor是自己的头像，pos是自己的头像，neg是别人的头像

anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
loss = loss_f(anchor, pos, neg) 
#输出：Triplet Margin Loss tensor(1.5000)

margin = 1
a, p, n = anchor[0], pos[0], neg[0]

d_ap = torch.abs(a-p)
d_an = torch.abs(a-n)

loss = d_ap - d_an + margin

16.nn.HingeEmbeddingLoss()

计算两个输入的相似性，常用于非线性嵌入和半监督学习，输入x应该是两个输入之差的绝对值
$l_n:if\quad y_n=1\quad x_n;if\quad y_n=-1\quad max\{0,\Delta-x_n\} \quad \Delta=margin$

inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])  #int型

loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none')
loss = loss_f(inputs, target)
# Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])

margin = 1.
loss = max(0, margin - inputs.numpy()[0, 2])
print(loss)  #0.5

17.nn.CosineEmbeddingLoss()

采用余弦相似度计算两个输入的相似性，embading中使用，计算方向上的差异

$cos(\theta)=\frac{A·B}{||A||||B||}$

$\begin{equation} loss(x,y)=\left\{ \begin{aligned} 1-cos(x_1,x_2) & , & if \quad y=1\\ max(0,cos(x_1,x_2)-margin) & , & if \quad y=-1. \end{aligned} \right. \end{equation}$

margin取值 $[-1,1]$ ，推荐取值 $[0,0.5]$

x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])
target = torch.tensor([[1, -1]], dtype=torch.float)

loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')
loss = loss_f(x1, x2, target)
print("Cosine Embedding Loss", loss)

#Cosine Embedding Loss tensor([[0.0167, 0.9833]])

margin = 0.
def cosine(a, b):
    numerator = torch.dot(a, b)
    denominator = torch.norm(a, 2) * torch.norm(b, 2)
    return float(numerator/denominator)
#norm 函数就是求范数，默认是2，就是求向量的模
l_1 = 1 - (cosine(x1[0], x2[0]))

l_2 = max(0, cosine(x1[0], x2[0]))

print(l_1, l_2)

18.nn.CTCLoss()

时序类数据分类

pytorch 损失函数

pytorch 权值初始化与损失函数

梯度爆炸和梯度消失

为什么会产生以上问题

方差一致性：保持数据尺度维持在恰当范围，通常方差为1

饱和函数，如sigmod ，Tanh

对于非饱和函数 , $ReLU$ 等变种

损失函数

1.nn.CrossEntropyLoss()

2.nn.NLLLoss()

3.nn.BCELoss()

4.nn.BCEWithLogitsLoss()

5.nn.L1Loss ()

6.nn.MSELoss()

7.nn.SmoothL1Loss()

8.nn.PoissonNLLLoss()

9.nn.KLDivLoss()

10.nn.MarginRankingLoss()

11.nn.MultiLabelMarginLoss()

12.nn.SoftMarginLoss()

13.nn.MultiLabelSoftMarginLoss()

14.nn.MultiMarginLoss()

15.nn.TripletMarginLoss()

16.nn.HingeEmbeddingLoss()

17.nn.CosineEmbeddingLoss()

18.nn.CTCLoss()

猜你喜欢

热点阅读

pytorch 损失函数

pytorch 权值初始化与损失函数

梯度爆炸和梯度消失

为什么会产生以上问题

方差一致性：保持数据尺度维持在恰当范围，通常方差为1

饱和函数，如sigmod ，Tanh

对于非饱和函数 ,等变种

损失函数

1.nn.CrossEntropyLoss()

2.nn.NLLLoss()

3.nn.BCELoss()

4.nn.BCEWithLogitsLoss()

5.nn.L1Loss ()

6.nn.MSELoss()

7.nn.SmoothL1Loss()

8.nn.PoissonNLLLoss()

9.nn.KLDivLoss()

10.nn.MarginRankingLoss()

11.nn.MultiLabelMarginLoss()

12.nn.SoftMarginLoss()

13.nn.MultiLabelSoftMarginLoss()

14.nn.MultiMarginLoss()

15.nn.TripletMarginLoss()

16.nn.HingeEmbeddingLoss()

17.nn.CosineEmbeddingLoss()

18.nn.CTCLoss()

猜你喜欢

热点阅读

对于非饱和函数 , $ReLU$ 等变种