Pytorch中的学习率调整方法
介绍
在Pytorch
中有6种学习率调整方法,分别如下:
StepLR
MultiStepLR
ExponentialLR
CosineAnnealingLR
ReduceLRonPlateau
LambdaLR
它们用来在不停的迭代中去修改学习率,这6种方法都继承于一个基类_LRScheduler
,这个类有三个主要属性以及两个主要方法。
三个主要属性分别是:
-
optimizer
:关联的优化器 -
last_epoch
:记录epoch数 -
base_lrs
:记录初始学习率
两个主要方法分别是:
-
step()
:更新下一个epoch的学习率 -
get_last_lr()
:返回上次计算后的学习率
先分别介绍一下Pytorch
中提供的6种学习率调整方法
注意
学习率的调整只能在epoch循环中使用,不能在batch循环中使用,因为那样将导致学习率快速下降。而且从属性中的last_epoch也可以看出,学习率调整是在epoch层面进行的。
StepLR
功能:等间隔调整学习率,也就是每隔一段时间去调整学习率,最终学习率的图形是阶梯形状逐渐下降(gamma小于1)或者上升(gamma大于1,估计没有人会这么设置吧)
主要参数:
-
step_size
:调整学习率的间隔数 -
gamma
:调整系数,也就是每次调整学习率,都将之前的学习率乘以这个系数,具体调整方式:lr=lr*gamma
下面我们来看一看StepLR
的图形变化,这里取gamma为0.5,共50个epoch:
详细代码见附录
MultiStepLR
功能:按给定间隔调整学习率
主要参数:
-
milestones
:设定调整时刻数,这个参数是一个列表,其中每一项都是一个整数,代表所需要调整学习率的epoch时刻,例如:[50,125,180]
表示分别在epoch为50,125,180时进行调整 -
gamma
:调整系数与StepLR
中的gamma
是同样的含义
下图是MultiStepLR
的变化曲线,这里设置的milestones
为[20, 25, 35]
,可以看出在第20,25,35个epoch时,学习率有所变化
ExponentialLR
功能:按指数衰减调整学习率
主要参数:
-
gamma
:指数的底,通常设置为接近1的数(0.9),调整方式:lr=lr*gamma**epoch
下图是MultiStepLR
的变化曲线,gamma
取值为0.9,可以看出,这里的学习率是呈指数形式下降的
CosineAnnealingLR
功能:余弦周期调整学习率,这种调整方式是可以增大学习率的
主要参数:
-
T_max
:下降周期,这个参数表示的是余弦周期的一半 -
eta_min
:学习率下限
调整方式:
image-20201121151425528.png
下图是CosineAnnealingLR
的变化曲线,T_max
设置为10,eta_min
没有设置,默认为0,从图中可以看出学习率的变化周期性的变化,Cos
函数的周期是T_max
的2倍,也就是20
ReduceLRonPlateau
功能:监控指标,当指标不再变化则调整,非常实用,可以监控loss
或者accuracy
主要参数:
-
mode
:min/max两种模式,min观察监控的指标不下降就调整,max观察监控的指标不上升就调整 -
factor
:调整系数,相当于StepLR
中的gamma
-
patience
:“耐心”,接受连续几次不变化 -
cooldown
:“冷却时间”,停止监控一段时间 -
verbose
:是否打印日志 -
min_lr
:学习率下限 -
eps
:学习率衰减最小值
下图是ReduceLRonPlateau
的变化曲线,一些参数设置如下:
lr = 0.1
factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True
这里最初使用一个固定的loss_value=0.5
来模拟loss
的不变化,然后再第4个epoch
时,将loss_value
设置为0.4,图像如下:
终端中输出如下信息:
Epoch 10: reducing learning rate of group 0 to 3.0000e-02.
Epoch 19: reducing learning rate of group 0 to 9.0000e-03.
Epoch 28: reducing learning rate of group 0 to 2.7000e-03.
Epoch 37: reducing learning rate of group 0 to 8.1000e-04.
Epoch 46: reducing learning rate of group 0 to 2.4300e-04.
分析
终端中显示第10个epoch时,对学习率进行了调整,其中0,1,2,3的epoch(4次epoch),没有对学习率进行调整,在epoch=3
(第4次)时,由于对loss_value
手动减小到了0.4,模拟了loss减小,所以ReduceLRonPlateau
的patience
在epoch=4
(第5次)时重新开始计数,直到epoch=8
(第9次)时,patience
到达了极限(patience=5),所以在epoch=9
(第10次)时对学习率进行了调整,学习率被乘以0.3,调整到了0.03。
此后,ReduceLRonPlateau
进入cooldown
状态,等待3轮(cooldown=3
)不对loss
进行监控,直到epoch=12
(第13次),然后继续观察loss
的变化,观察5个epoch,此时epoch=17
(第18次),patience
又到达了极限(patience=5),在epoch=18
(第19次)时对学习率进行了调整,学习率又被乘以0.3,调整到了0.009。
后续依次类推,学习率分别在第28、37、46次时被进行了调整。
LambdaLR
功能:自定义调整策略
主要参数:
-
lr_lambda
:function or list,如果是list,则list中每一元素都得是function。这里传入lr_lambda
的参数是last_epoch
下面使用LambdaLR
模拟一下ExponentialLR
,gamma
设置为0.95
lambda epoch: 0.95**epoch
生成的曲线如下图所示:
LambdaLR附录
下面代码中的Net
为假的网络,无实际意义
StepLR代码
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = StepLR(optimizer=optimizer, step_size=5, gamma=0.5)
lr_list = []
for i in range(50):
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
lr_list.append(optimizer.param_groups[0]['lr'])
# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['StepLR:gamma=0.5'])
plt.show()
print(scheduler)
MultiStepLRd代码
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import MultiStepLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = MultiStepLR(optimizer=optimizer, milestones=[20, 25, 35], gamma=0.5)
lr_list = []
for i in range(50):
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
lr_list.append(optimizer.param_groups[0]['lr'])
scheduler.step()
# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['MultiStepLR:gamma=0.5'])
plt.show()
print(scheduler)
ExponentialLR代码
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ExponentialLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
gamma = 0.9
# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ExponentialLR(optimizer=optimizer, gamma=gamma)
lr_list = []
for i in range(50):
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
lr_list.append(optimizer.param_groups[0]['lr'])
scheduler.step()
# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['ExponentialLR: gamma={}'.format(gamma)])
plt.show()
print(scheduler)
ReduceLRonPlateau代码
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
loss_value = 0.5
factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True
# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ReduceLROnPlateau(optimizer=optimizer, mode=mode, factor=factor, patience=patience,
verbose=verbose, cooldown=cooldown, min_lr=min_lr)
lr_list = []
for i in range(50):
lr_list.append(optimizer.param_groups[0]['lr'])
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# 手动模拟学习率的降低
if i == 3:
loss_value = 0.4
scheduler.step(loss_value)
# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['ReduceLROnPlateau'])
plt.show()
print(scheduler)
LambdaLR代码
import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, TensorDataset
lr = 0.1
# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()
train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
train_data = iter(train_loader)
train_x, train_y = next(train_data)
class Net(torch.nn.Module):
def __init__(self, hidden_num=10):
super(Net, self).__init__()
self.layer = Sequential(
Linear(1, hidden_num),
Linear(hidden_num, 1),
)
def forward(self, x):
x = self.layer(x)
return x
net = Net()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = LambdaLR(optimizer=optimizer, lr_lambda=lambda epoch: 0.95**epoch) # 模拟ExponentialLR
lr_list = []
for i in range(50):
lr_list.append(scheduler.get_last_lr())
for x, y in train_loader:
y_pred = net(x)
loss = criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['LambdaLR'])
plt.show()
print(scheduler)