图像分类学习(2):VGGNet-16
2019-08-03 本文已影响0人
坐下等雨
1、VGG简介
2014年,牛津大学计算机视觉组(Visual Geometry Group)和Google DeepMind公司的研究员一起研发出了新的深度卷积神经网络:VGGNet,并取得了ILSVRC2014比赛分类项目的第二名(第一名是GoogLeNet,也是同年提出的)和定位项目的第一名。
VGGNet探索了卷积神经网络的深度与其性能之间的关系,成功地构筑了16~19层深的卷积神经网络,证明了增加网络的深度能够在一定程度上影响网络最终的性能,使错误率大幅下降,同时拓展性又很强,迁移到其它图片数据上的泛化性也非常好。到目前为止,VGG仍然被用来提取图像特征。
2、主要特性
VGGNet全部使用3 * 3的卷积核和2 * 2的池化核,通过不断加深网络结构来提升性能。网络层数的增长并不会带来参数量上的爆炸,因为参数量主要集中在最后三个全连接层中。同时,两个3 * 3卷积层的串联相当于1个5 * 5的卷积层,3个3 * 3的卷积层串联相当于1个7 * 7的卷积层,即3个3 * 3卷积层的感受野大小相当于1个7 * 7的卷积层。但是3个3 * 3的卷积层参数量只有7 * 7的一半左右,同时前者可以有3个非线性操作,而后者只有1个非线性操作,这样使得前者对于特征的学习能力更强。
3、网络结构

VGG的结构十分简洁,卷积层全部由小卷积核,RELU函数和小的池化层堆叠而成,最后是全连接层,由于是参加ImageNets比赛,最后一层应该是1000(图片有误),然后在经过softmax。
因为这里CIFAR10的图片尺寸只有32 * 32,防止经过各层卷积和池化后feature map尺寸小于卷积核尺寸,需要各个层上都加上padding=1,这样经过每层的3 * 3卷积,图片尺寸都不会改变,只有经过池化层尺寸减半。由于网络有5个池化层,所以feature map减为32/2^5=1,而通道数为512,所以全连接层的尺寸由1 * 1 * 512开始,最后一层为10(十分类)在经过softmax
4、pytorch实现
4.1 构建VGG网络
编写mian.py文件如下,并在GPU上进行训练。
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn
import os
transform = transforms.Compose(
[transforms.ToTensor(), # 将图片转成tensor
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # 将图片有[0,1]转成[-1,1]
trainset = torchvision.datasets.CIFAR10(root='./cv/cifar10',
train=True,
download=True,
transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,
batch_size=32,
shuffle=True,
num_workers=0)
testset = torchvision.datasets.CIFAR10(root='./cv/cifar10',
train=False,
download=True,
transform=transform)
testloader = torch.utils.data.DataLoader(testset,
batch_size=32,
shuffle=False,
num_workers=0)
# cifar10的分类
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck']
# VGG16配置
cfg = {'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512,
521, 'M', 512, 512, 512, 'M']}
# 定义网络
class VGG(nn.Module):
def __init__(self, net_name):
super(VGG, self).__init__()
self.features = self._make_layers(cfg[net_name])
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(512, 512),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(512, 512),
nn.ReLU(True),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def _make_layers(self, cfg):
layers = []
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(2, 2)]
else:
layers += [nn.Conv2d(in_channels, v, kernel_size=3, padding=1),
nn.BatchNorm2d(v),
nn.ReLU(True)]
in_channels = v
return nn.Sequential(*layers)
# 在GPU上训练
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = VGG('VGG16').to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
if __name__ == '__main__':
# 训练网络,并将每个epoch的结果保存下来
for epoch in range(20):
train_loss = 0.0
total = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
total += labels.size(0)
print('epoch:{}, loss:{}'.format(epoch+1, train_loss/total))
print('Saving epoch {} model...' .format(epoch+1))
state = {
'net': net.state_dict(),
'epoch': epoch+1
}
if not os.path.isdir('checkpoint'):
os.mkdir('checkpoint')
torch.save(state, './checkpoint/cifar10_epoch_{}.ckpt' .format(epoch+1))
print('Finished Training!')
4.2 测试网络模型
编写test.py文件,用于测试网络准确性
from main import *
if __name__ == '__main__':
# 加载效果最好的网络模型
checkpoint = torch.load('./checkpoint/cifar10_epoch_20.ckpt')
net.load_state_dict(checkpoint['net'])
start_epoch = checkpoint['epoch']
# 查看前十张图片的预测效果
dataiter = iter(testloader)
test_images, test_labels = dataiter.next()
test_images, test_labels = test_images.to(device), test_labels.to(device)
outputs = net(test_images[:10])
_, predicted = torch.max(outputs, 1)
print('预测:' + ' '.join('%5s' % classes[predicted[i]] for i in range(10)))
print('实际:' + ' '.join('%5s' % classes[test_labels[i]] for i in range(10)))
# 测试模型的准确率
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = net(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (100*correct/total))
输出如下
预测: cat ship ship plane frog dog car frog cat car
实际: cat ship ship plane frog frog car frog cat car
Accuracy of the network on the 10000 test images: 82 %
可以看到经过20个epoch,测试的10个图片,只有一个frog被误认成了dog,剩下的全部识别正确。而VGGNet的准确也达到了82%,远远好于LeNet。