pytorch实现mnist手写数字识别（二)

2020-03-31 本文已影响0人陨星落云

训练神经网络

在上一部分中，我们建立的神经网络不是那么好，它对我们的手写数字一无所知。神经网络的非线性激活函数工作方式类似于通用函数拟合。有一些函数，可以将您的输入映射到输出。例如，将手写数字图像分类的概率。神经网络的强大之处，在于我们可以训练它们以逼近该F函数。只要给定任何具有足够数据和计算时间，就可以得到F函数，但这个函数可能非常复杂。

function_approx.png

起初，网络是无知的，它不知道将输入映射到输出函数。我们将通过真实数据的示例来训练网络，然后调整网络参数以使其接近此F函数。

为了找到这些参数，我们需要通过网络预测实际输出。为此，我们计算了损失函数（也称为成本），这是对我们的预测误差的度量。例如，均方损失函数通常用于回归和二元分类问题：
$\large \ell = \frac{1}{2n}\sum_i^n{\left(y_i - \hat{y}_i\right)^2}$
其中 $n$ 是训练示例的数量， $y_i$ 是真实的标签， $\hat {y} _i$ 是预测的标签。

通过相对于网络参数，使这种损失最小化，我们可以找到损失最小且网络能够以高精度预测正确标签的参数。我们使用梯度下降算法寻找最小值。梯度是损失函数的斜率，指向变化最快的方向。为了在最短的时间内达到最小，我们要遵循梯度（向下）。您可以认为这就像通过沿着最陡峭的坡道下山。

gradient_descent.png

反向传播

对于单层网络，梯度下降很容易实现。但是，对于像我们构建的那样的更深层次的多层神经网络来说，它要复杂得多。如此复杂，以至于研究人员花了大约30年的时间才弄清楚如何训练多层网络。

训练多层网络是通过反向传播来完成的，反向传播实际上只是微积分中链式法则的一种应用。如果将两层网络转换为图形表示，则最容易理解。

backprop_diagram.png

在网络的前向传播中，我们的数据和操作在这里从下到上。我们输入 $x$ 经过权重为 $W_1$ 且偏置项为 $b_1$ 的线性变换 $L_1$ 。然后，经过sigmoig函数操作 $S$ 和另一个线性变换 $L_2$ 。最后，我们计算损失 $\ell$ 。我们使用损失函数来衡量网络预测的准确程度。然后的目标是调整权重和偏差以使损失最小化。

为了训练梯度下降的权重，我们通过网络向后传播得到梯度。每个操作在输入和输出之间都有一定的梯度。在反向传播时，我们将输入的梯度乘以操作的梯度。从数学上讲，这实际上只是使用链式法则计算损失函数的梯度。
$\large \frac{\partial \ell}{\partial W_1} = \frac{\partial L_1}{\partial W_1} \frac{\partial S}{\partial L_1} \frac{\partial L_2}{\partial S} \frac{\partial \ell}{\partial L_2}$
注意：我在这里省略了一些向量微积分知识。我们使用具有一定学习率 $\alpha$ 的梯度更新权重。
$\large W^\prime_1 = W_1 - \alpha \frac{\partial \ell}{\partial W_1}$
设置学习率α，利用最小迭代次数使权重快速更新，以使得损失函数最小化。

损失函数

让我们开始看看如何使用PyTorch计算损失函数。通过nn模块，PyTorch提供了诸如交叉熵损失函数（nn.CrossEntropyLoss）。您通常会看到损失分配给 criterion。如上述所示，对于例如 MNIST 的分类问题，我们使用softmax函数预测类概率。对于softmax输出，您想使用交叉熵作为损失函数。要实际计算误差，先要定义标准 criterion，然后再传递网络输出和正确的标签。

这里要特别注意的重要事项。查看the documentation for nn.CrossEntropyLoss的文档。

此条件将nn.LogSoftmax（）和nn.NLLLoss（）组合在一个类中。

该输入应包含每个类的分数。

这意味着我们需要将网络的原始输出传递到损失函数中，而不是softmax函数中。我们使用logits是因为softmax给您的概率通常非常接近零或一，但是浮点数不能准确地表示接近零或一。最好避免使用概率进行计算，因此我们使用对数概率。

import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)),
                              ])
# Download and load the training data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

注意：如果还不了解nn.Sequential，请看上一次的内容。

# Build a feed-forward network
model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10))

# Define the loss
criterion = nn.CrossEntropyLoss()

# Get our data
images, labels = next(iter(trainloader))
# Flatten images
images = images.view(images.shape[0], -1)

# Forward pass, get our logits
logits = model(images)
# Calculate the loss with the logits and the labels
loss = criterion(logits, labels)

print(loss)

tensor(2.3010, grad_fn=<NllLossBackward>)

根据经验，使用log-softmax（用nn.LogSoftmax或F.log_softmax函数）构建模型更为方便。然后，您可以通过取指数torch.exp(output)来获得实际概率。对于log-softmax输出，您要使用负对数似然损失nn.NLLLoss(文档)。

练习：建立一个返回log-softmax作为输出的模型，并使用负对数似然损失来计算损失。请注意，对于nn.LogSoftmax和F.log_softmax，您需要适当地设置dim关键字参数。 dim = 0计算各行的softmax，因此每一列的总和为1，而dim = 1计算各列的总和，因此每一行的总和为1。考虑一下输出是什么，并适当选择dim。

# TODO: Build a feed-forward network
model = nn.Sequential(nn.Linear(784,128),
                      nn.ReLU(),
                      nn.Linear(128,64),
                      nn.ReLU(),
                      nn.Linear(64,10),
                      nn.LogSoftmax(dim=1))

# TODO: Define the loss
criterion = nn.NLLLoss()

### Run this to check your work
# Get our data
images, labels = next(iter(trainloader))
# Flatten images
images = images.view(images.shape[0], -1)

# Forward pass, get our logits
logits = model(images)
# Calculate the loss with the logits and the labels
loss = criterion(logits, labels)

print(loss)

tensor(2.3090, grad_fn=<NllLossBackward>)

自动求梯度

现在我们知道了如何计算损失函数，如何使用它进行反向传播？ Torch提供了一个autograd模块，用于自动计算张量的梯度。我们可以使用它来计算所有参数相对于损失函数的梯度。 Autograd的工作方式是跟踪张量上执行的操作，然后向后进行这些操作，并计算沿途的梯度。为了确保PyTorch跟踪张量上的操作并计算梯度，你需要在张量上设置require_grad = True。你可以在创建时使用require_grad关键字来执行此操作，也可以随时使用x.requires_grad_（True）来执行此操作。

你可以使用 torch.no_grad() 来关闭梯度的计算:

x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False

另外，您可以使用torch.set_grad_enabled（True | False）来打开或关闭梯度。

使用z.backward（）针对某些变量z计算梯度。这会反向传播创建z的操作。

x = torch.randn(2,2, requires_grad=True)
print(x)

tensor([[-0.7619, -0.9604],
        [-0.6987,  1.2588]], requires_grad=True)

y = x**2
print(y)

tensor([[0.5805, 0.9223],
        [0.4882, 1.5845]], grad_fn=<PowBackward0>)

在下面我们可以看到创建y的操作，即幂操作PowBackward0。

## grad_fn shows the function that generated this variable
print(y.grad_fn)

<PowBackward0 object at 0x000002AD868AD780>

autograd模块会跟踪这些操作，并且知道如何计算每个梯度。通过这种方式，它可以针对任何一个张量计算一系列操作的梯度。让我们将张量y变为标量，即取均值。

z = y.mean()
print(z)

tensor(0.8938, grad_fn=<MeanBackward0>)

请检查 x 与y 的梯度是否为空

print(x.grad)

None

要计算梯度，您需要在变量（例如z）上运行.backward方法。这将计算z相对于x的梯度
$\frac{\partial z}{\partial x} = \frac{\partial}{\partial x}\left[\frac{1}{n}\sum_i^n x_i^2\right] = \frac{x}{2}$

z.backward()
print(x.grad)
print(x/2)

tensor([[-0.3809, -0.4802],
        [-0.3493,  0.6294]])
tensor([[-0.3809, -0.4802],
        [-0.3493,  0.6294]], grad_fn=<DivBackward0>)

这些梯度计算对于神经网络特别有用。对于训练，我们需要相对于损失函数的梯度。使用PyTorch，我们通过网络正向传播数据以计算损失，然后反向传播以计算相对于损失函数的梯度。一旦有了梯度，就可以进行梯度下降步骤。

损失与梯度

当我们使用PyTorch创建网络时，所有参数都使用require_grad = True初始化。这意味着当我们计算损失并调用loss.backward（）时，将计算参数的梯度。这些梯度用于通过梯度下降来更新权重。在下面，您可以看到一个利用反向传播通过计算梯度的例子。

# Build a feed-forward network
model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()
images, labels = next(iter(trainloader))
images = images.view(images.shape[0], -1)

logits = model(images)
loss = criterion(logits, labels)

print('Before backward pass: \n', model[0].weight.grad)

loss.backward()

print('After backward pass: \n', model[0].weight.grad)

Before backward pass: 
 None
After backward pass: 
 tensor([[ 0.0002,  0.0002,  0.0002,  ...,  0.0002,  0.0002,  0.0002],
        [ 0.0058,  0.0058,  0.0058,  ...,  0.0058,  0.0058,  0.0058],
        [-0.0002, -0.0002, -0.0002,  ..., -0.0002, -0.0002, -0.0002],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0012, -0.0012, -0.0012,  ..., -0.0012, -0.0012, -0.0012],
        [ 0.0024,  0.0024,  0.0024,  ...,  0.0024,  0.0024,  0.0024]])

训练神经网络

我们需要开始进行最后一步训练，这是一个优化器，我们将使用它使用梯度来更新权重。我们是从PyTorch的optim中获得的。例如，我们可以将随机梯度下降与optim.SGD一起使用。您可以在下面查看如何定义优化器。

from torch import optim

# Optimizers require the parameters to optimize and a learning rate
optimizer = optim.SGD(model.parameters(), lr=0.01)

现在我们知道了如何使用所有各个部分，现在该看看它们如何协同工作。在遍历所有数据之前，我们只考虑一个学习步骤。 PyTorch的一般过程：

通过网络进行正向传播传递
使用网络输出来计算损失
使用loss.backward（）通过网络进行反向传播以计算梯度
使用优化器（随机梯度下降算法）更新权重

下面，我将进行一个训练步骤，并打印出权重和梯度，以便您查看其变化。请注意，我有一行代码optimizer.zero_grad（）。当你使用相同的参数进行多次向后传递时，会导致梯度累积。这意味着你需要在每次训练通过时将梯度归零，否则你将保留先前训练批次中的梯度。

print('Initial weights - ', model[0].weight)

images, labels = next(iter(trainloader))
images.resize_(64, 784)

# Clear the gradients, do this because gradients are accumulated
optimizer.zero_grad()

# Forward pass, then backward pass, then update weights
output = model(images)
loss = criterion(output, labels)
loss.backward()
print('Gradient -', model[0].weight.grad)

Initial weights -  Parameter containing:
tensor([[ 0.0310, -0.0118, -0.0347,  ..., -0.0017,  0.0066, -0.0122],
        [ 0.0007,  0.0074,  0.0232,  ..., -0.0357,  0.0227,  0.0052],
        [ 0.0121, -0.0286,  0.0265,  ...,  0.0174,  0.0127, -0.0132],
        ...,
        [ 0.0347,  0.0156,  0.0166,  ...,  0.0303, -0.0136,  0.0295],
        [-0.0146, -0.0036,  0.0253,  ..., -0.0104,  0.0069,  0.0213],
        [ 0.0028,  0.0031, -0.0184,  ..., -0.0025,  0.0256, -0.0037]],
       requires_grad=True)
Gradient - tensor([[ 0.0002,  0.0002,  0.0002,  ...,  0.0002,  0.0002,  0.0002],
        [-0.0004, -0.0004, -0.0004,  ..., -0.0004, -0.0004, -0.0004],
        [-0.0001, -0.0001, -0.0001,  ..., -0.0001, -0.0001, -0.0001],
        ...,
        [-0.0003, -0.0003, -0.0003,  ..., -0.0003, -0.0003, -0.0003],
        [-0.0002, -0.0002, -0.0002,  ..., -0.0002, -0.0002, -0.0002],
        [ 0.0013,  0.0013,  0.0013,  ...,  0.0013,  0.0013,  0.0013]])

# Take an update step and few the new weights
optimizer.step()
print('Updated weights - ', model[0].weight)

Updated weights -  Parameter containing:
tensor([[ 0.0310, -0.0118, -0.0347,  ..., -0.0017,  0.0066, -0.0122],
        [ 0.0007,  0.0074,  0.0232,  ..., -0.0357,  0.0227,  0.0052],
        [ 0.0121, -0.0286,  0.0265,  ...,  0.0174,  0.0127, -0.0132],
        ...,
        [ 0.0347,  0.0156,  0.0166,  ...,  0.0303, -0.0136,  0.0296],
        [-0.0146, -0.0035,  0.0253,  ..., -0.0104,  0.0069,  0.0213],
        [ 0.0028,  0.0031, -0.0184,  ..., -0.0025,  0.0256, -0.0037]],
       requires_grad=True)

真正训练神经网络

现在，我们将该算法放入循环中，以便可以遍历所有图像。遍历整个数据集的过程称为epoch。因此，在这里，我们将遍历Trainloader，以获取我们的训练批次。对于每一批，我们将进行一次训练，计算损失，进行反向传播，并更新权重。

练习：我们利用神经网络进行训练。如果过程没问题的化，则每个epoch的训练损失都会减少。

## Your solution here

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003)

epochs = 10
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        # Flatten MNIST images into a 784 long vector
        images = images.view(images.shape[0], -1)
    
        # TODO: Training pass
        optimizer.zero_grad()
        
        output = model(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step() 
        
        running_loss += loss.item()
    else:
        print(f"Training loss: {running_loss/len(trainloader)}")

Training loss: 1.9486986867654552
Training loss: 0.8663170544831738
Training loss: 0.5123799103917852
Training loss: 0.4224105821108259
Training loss: 0.38050861858419266
Training loss: 0.3551620597651264
Training loss: 0.33704469087662725
Training loss: 0.3233772503859453
Training loss: 0.31237514997755034
Training loss: 0.3026550426952112

经过训练的神经网络，我们可以查看它的预测。

%matplotlib inline
import helper

images, labels = next(iter(trainloader))

img = images[0].view(1, 784)
# Turn off gradients to speed up this part
with torch.no_grad():
    logps = model(img)

# Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(logps)
helper.view_classify(img.view(1, 28, 28), ps)

output_31_0.png

现在，我们的网络非常出色。它可以准确预测我们图像中的数字。

pytorch实现mnist手写数字识别（二)

训练神经网络

反向传播

损失函数

自动求梯度

损失与梯度

训练神经网络

真正训练神经网络

猜你喜欢

热点阅读