cs231n assignment2(3)
assignment2的第三部分,是熟悉深度学习框架pytorch或者tensorflow,这里选择的是使用pytorch框架。该部分主要通过三个层次:Barebones、Module API、Sequential API,来了解pytorch。
Barebones
在该层次中,需要利用pytorch所提供的一些函数,不仅需要定义神经网络的结构,同时还需编写网络的前向传播以及模型的训练部分;而参数的梯度可以直接由pytorch自动算出。
下述是定义一个两层神经网络的代码:
import torch.nn.functional as F # useful stateless functions
def two_layer_fc(x, params):
"""
A fully-connected neural networks; the architecture is:
NN is fully connected -> ReLU -> fully connected layer.
Note that this function only defines the forward pass;
PyTorch will take care of the backward pass for us.
The input to the network will be a minibatch of data, of shape
(N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
and the output layer will produce scores for C classes.
Inputs:
- x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
input data.
- params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
w1 has shape (D, H) and w2 has shape (H, C).
Returns:
- scores: A PyTorch Tensor of shape (N, C) giving classification scores for
the input data x.
"""
# first we flatten the image
x = flatten(x) # shape: [batch_size, C x H x W]
w1, w2 = params
# Forward pass: compute predicted y using operations on Tensors. Since w1 and
# w2 have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don't need to keep references to intermediate values.
# you can also use `.clamp(min=0)`, equivalent to F.relu()
x = F.relu(x.mm(w1))
x = x.mm(w2)
return x
def two_layer_fc_test():
hidden_layer_size = 42
x = torch.zeros((64, 50), dtype=dtype) # minibatch size 64, feature dimension 50
w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
scores = two_layer_fc(x, [w1, w2])
print(scores.size()) # you should see [64, 10]
two_layer_fc_test()
下述是利用SGD来训练模型的方法:
def train_part2(model_fn, params, learning_rate):
"""
Train a model on CIFAR-10.
Inputs:
- model_fn: A Python function that performs the forward pass of the model.
It should have the signature scores = model_fn(x, params) where x is a
PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
model weights, and scores is a PyTorch Tensor of shape (N, C) giving
scores for the elements in x.
- params: List of PyTorch Tensors giving weights for the model
- learning_rate: Python scalar giving the learning rate to use for SGD
Returns: Nothing
"""
for t, (x, y) in enumerate(loader_train):
# Move the data to the proper device (GPU or CPU)
x = x.to(device=device, dtype=dtype)
y = y.to(device=device, dtype=torch.long)
# Forward pass: compute scores and loss
scores = model_fn(x, params)
loss = F.cross_entropy(scores, y)
# Backward pass: PyTorch figures out which Tensors in the computational
# graph has requires_grad=True and uses backpropagation to compute the
# gradient of the loss with respect to these Tensors, and stores the
# gradients in the .grad attribute of each Tensor.
loss.backward()
# Update parameters. We don't want to backpropagate through the
# parameter updates, so we scope the updates under a torch.no_grad()
# context manager to prevent a computational graph from being built.
with torch.no_grad():
for w in params:
w -= learning_rate * w.grad
# Manually zero the gradients after running the backward pass
w.grad.zero_()
if t % print_every == 0:
print('Iteration %d, loss = %.4f' % (t, loss.item()))
check_accuracy_part2(loader_val, model_fn, params)
print()
其实在网络训练部分,也可以使用torch.optim中的函数来完成网络的训练,而无需自己编写训练部分的函数。
Module API
在Barebones层次,其实与numpy类似,利用其提供的一些运算方法,来构建神经网络;在这个过程中,除了无需考虑如何计算梯度外,其他与使用numpy基本无异。而在Module API,可以直接利用nn.Module中的一些基本神经网络层次直接构建神经网络。
下述是构建两层全连接神经网络结构:
class TwoLayerFC(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
# assign layer objects to class attributes
self.fc1 = nn.Linear(input_size, hidden_size)
# nn.init package contains convenient initialization methods
# http://pytorch.org/docs/master/nn.html#torch-nn-init
nn.init.kaiming_normal_(self.fc1.weight)
self.fc2 = nn.Linear(hidden_size, num_classes)
nn.init.kaiming_normal_(self.fc2.weight)
def forward(self, x):
# forward always defines connectivity
x = flatten(x)
scores = self.fc2(F.relu(self.fc1(x)))
return scores
def test_TwoLayerFC():
input_size = 50
x = torch.zeros((64, input_size), dtype=dtype) # minibatch size 64, feature dimension 50
model = TwoLayerFC(input_size, 42, 10)
scores = model(x)
print(scores.size()) # you should see [64, 10]
test_TwoLayerFC()
上述的定义中,定义自身的神经结构类来实现nn.Module这一接口,在init方法中,定义网络的结构,使用的是框架所提供的一些基本层;而在forward方法中,描述网络前向传播的过程,把输入x按顺序代入定义的层次即可,无需像上一部分那样,要完成相乘等一些基本操作。而在真正实现时,只需把网络结构输入即可。
模型的训练也与上一部分有些许不同,利用torch.optim来训练网络,代码如下:
def train_part34(model, optimizer, epochs=1):
"""
Train a model on CIFAR-10 using the PyTorch Module API.
Inputs:
- model: A PyTorch Module giving the model to train.
- optimizer: An Optimizer object we will use to train the model
- epochs: (Optional) A Python integer giving the number of epochs to train for
Returns: Nothing, but prints model accuracies during training.
"""
model = model.to(device=device) # move the model parameters to CPU/GPU
for e in range(epochs):
for t, (x, y) in enumerate(loader_train):
model.train() # put model to training mode
x = x.to(device=device, dtype=dtype) # move to device, e.g. GPU
y = y.to(device=device, dtype=torch.long)
scores = model(x)
loss = F.cross_entropy(scores, y)
# Zero out all of the gradients for the variables which the optimizer
# will update.
optimizer.zero_grad()
# This is the backwards pass: compute the gradient of the loss with
# respect to each parameter of the model.
loss.backward()
# Actually update the parameters of the model using the gradients
# computed by the backwards pass.
optimizer.step()
if t % print_every == 0:
print('Iteration %d, loss = %.4f' % (t, loss.item()))
check_accuracy_part34(loader_val, model)
print()
代码中的前半部分与上一部分的一样,算出score、loss的值以及使用loss.backward()算出梯度;而参数更新部分,不用具体实现相应的方法,而是通过语句optimizer.step()来更新。而optimizer则由外层来定义。
这是整个部分的完整代码:
hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
train_part34(model, optimizer)
Sequential API
在Module API这一层次,整个流程大致如下:init定义网络结构、forward描述前向部分、利用optim编写训练部分、将相应参数填入并训练。而在Sequential API这一层次,可以把“ forward描述前向部分 ”去掉,在定义网络结构后,其前向部分即可由框架自动完成,而剩余部分也还需取编写定义。
下述是利用Sequential API来定义两层神经网络的代码:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
def forward(self, x):
return flatten(x)
hidden_layer_size = 4000
learning_rate = 1e-2
model = nn.Sequential(
Flatten(),
nn.Linear(3 * 32 * 32, hidden_layer_size),
nn.ReLU(),
nn.Linear(hidden_layer_size, 10),
)
# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
momentum=0.9, nesterov=True)
train_part34(model, optimizer)
与前面相比,定义网络更为简洁明了。
小结
assignment2的最后一部分,对pytorch框架的熟悉,以后在构建更为复杂的神经网络项目时,使用pytorch就可提高效率。