Pytorch中的学习率调整方法

2020-11-21  本文已影响0人  zqyadam

介绍

Pytorch中有6种学习率调整方法,分别如下:

它们用来在不停的迭代中去修改学习率,这6种方法都继承于一个基类_LRScheduler,这个类有三个主要属性以及两个主要方法

三个主要属性分别是:

两个主要方法分别是:

先分别介绍一下Pytorch中提供的6种学习率调整方法

注意

学习率的调整只能在epoch循环中使用,不能在batch循环中使用,因为那样将导致学习率快速下降。而且从属性中的last_epoch也可以看出,学习率调整是在epoch层面进行的。

StepLR

功能:等间隔调整学习率,也就是每隔一段时间去调整学习率,最终学习率的图形是阶梯形状逐渐下降(gamma小于1)或者上升(gamma大于1,估计没有人会这么设置吧)

主要参数:

下面我们来看一看StepLR的图形变化,这里取gamma为0.5,共50个epoch:

StepLR

详细代码见附录

MultiStepLR

功能:按给定间隔调整学习率

主要参数:

下图是MultiStepLR的变化曲线,这里设置的milestones[20, 25, 35],可以看出在第20,25,35个epoch时,学习率有所变化

MultiStepLR

ExponentialLR

功能:按指数衰减调整学习率

主要参数:

下图是MultiStepLR的变化曲线,gamma取值为0.9,可以看出,这里的学习率是呈指数形式下降的

ExponentialLR

CosineAnnealingLR

功能:余弦周期调整学习率,这种调整方式是可以增大学习率的

主要参数:

调整方式:


image-20201121151425528.png

下图是CosineAnnealingLR的变化曲线,T_max设置为10,eta_min没有设置,默认为0,从图中可以看出学习率的变化周期性的变化,Cos函数的周期是T_max的2倍,也就是20

CosineAnnealingLR

ReduceLRonPlateau

功能:监控指标,当指标不再变化则调整,非常实用,可以监控loss或者accuracy

主要参数:

下图是ReduceLRonPlateau的变化曲线,一些参数设置如下:

lr = 0.1

factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True

这里最初使用一个固定的loss_value=0.5来模拟loss的不变化,然后再第4个epoch时,将loss_value设置为0.4,图像如下:

ReduceLROnPlateau

终端中输出如下信息:

Epoch    10: reducing learning rate of group 0 to 3.0000e-02.
Epoch    19: reducing learning rate of group 0 to 9.0000e-03.
Epoch    28: reducing learning rate of group 0 to 2.7000e-03.
Epoch    37: reducing learning rate of group 0 to 8.1000e-04.
Epoch    46: reducing learning rate of group 0 to 2.4300e-04.

分析

终端中显示第10个epoch时,对学习率进行了调整,其中0,1,2,3的epoch(4次epoch),没有对学习率进行调整,在epoch=3(第4次)时,由于对loss_value手动减小到了0.4,模拟了loss减小,所以ReduceLRonPlateaupatienceepoch=4(第5次)时重新开始计数,直到epoch=8(第9次)时,patience到达了极限(patience=5),所以在epoch=9(第10次)时对学习率进行了调整,学习率被乘以0.3,调整到了0.03。

此后,ReduceLRonPlateau进入cooldown状态,等待3轮(cooldown=3)不对loss进行监控,直到epoch=12(第13次),然后继续观察loss的变化,观察5个epoch,此时epoch=17(第18次),patience又到达了极限(patience=5),在epoch=18(第19次)时对学习率进行了调整,学习率又被乘以0.3,调整到了0.009。

后续依次类推,学习率分别在第28、37、46次时被进行了调整。

LambdaLR

功能:自定义调整策略

主要参数:

下面使用LambdaLR模拟一下ExponentialLRgamma设置为0.95

lambda epoch: 0.95**epoch

生成的曲线如下图所示:

LambdaLR

附录

下面代码中的Net为假的网络,无实际意义

StepLR代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1


# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = StepLR(optimizer=optimizer, step_size=5, gamma=0.5)

lr_list = []

for i in range(50):
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    scheduler.step()
    lr_list.append(optimizer.param_groups[0]['lr'])


# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['StepLR:gamma=0.5'])
plt.show()
print(scheduler)

MultiStepLRd代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import MultiStepLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1

# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = MultiStepLR(optimizer=optimizer, milestones=[20, 25, 35], gamma=0.5)

lr_list = []

for i in range(50):
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    lr_list.append(optimizer.param_groups[0]['lr'])
    scheduler.step()

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['MultiStepLR:gamma=0.5'])
plt.show()
print(scheduler)

ExponentialLR代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ExponentialLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1
gamma = 0.9

# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ExponentialLR(optimizer=optimizer, gamma=gamma)

lr_list = []

for i in range(50):
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    lr_list.append(optimizer.param_groups[0]['lr'])
    scheduler.step()

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['ExponentialLR: gamma={}'.format(gamma)])
plt.show()
print(scheduler)

ReduceLRonPlateau代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1

loss_value = 0.5

factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True

# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ReduceLROnPlateau(optimizer=optimizer, mode=mode, factor=factor, patience=patience,
                              verbose=verbose, cooldown=cooldown, min_lr=min_lr)

lr_list = []

for i in range(50):
    lr_list.append(optimizer.param_groups[0]['lr'])
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # 手动模拟学习率的降低
    if i == 3:
        loss_value = 0.4

    scheduler.step(loss_value)

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['ReduceLROnPlateau'])
plt.show()
print(scheduler)

LambdaLR代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1


# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = LambdaLR(optimizer=optimizer, lr_lambda=lambda epoch: 0.95**epoch) # 模拟ExponentialLR


lr_list = []

for i in range(50):
    lr_list.append(scheduler.get_last_lr())
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    scheduler.step()

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['LambdaLR'])
plt.show()
print(scheduler)

上一篇 下一篇

猜你喜欢

热点阅读