PyTorch 学习笔记

2019-05-01 本文已影响0人捡个七

Tensor

在 PyTorch 中 Tensor 代替了 NumPy 中的 Array，且可以使用 GPU 来加速计算。下面是一些例子。

# 构造一个 5x3 的矩阵
In [2]: x = torch.empty(5, 3)

In [3]: x
Out[3]:
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 1.6591e-42, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])

# 创建一个随机初始化的矩阵
In [4]: x = torch.rand(5, 3)

In [5]: x
Out[5]:
tensor([[0.1827, 0.6145, 0.4705],
        [0.9300, 0.1851, 0.8518],
        [0.6496, 0.9484, 0.9402],
        [0.3485, 0.1584, 0.4074],
        [0.8732, 0.4362, 0.3022]])

# 直接创建一个 tensor

In [8]: x = torch.tensor([5.5, 3])

In [9]: x
Out[9]: tensor([5.5000, 3.0000])

# 也可以在已存在的 tensor 的基础上创建一个变量
In [11]: x = x.new_ones(5, 3, dtype=torch.double)

In [12]: x
Out[12]:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [13]: x = torch.randn_like(x, dtype=torch.float)

In [14]: x
Out[14]:
tensor([[-0.4083,  0.3295,  2.0638],
        [-0.1591,  2.4042, -0.1991],
        [ 0.1514, -0.3264,  0.6047],
        [ 0.2679,  0.1346,  0.4984],
        [-0.3599,  1.1166, -1.6757]])

In [15]: x.size()
Out[15]: torch.Size([5, 3])

原地计算

基本的加减乘除等等操作都支持的，且在 PyTorch 中支持 in-place，即原地计算操作。任何使原地调整张量的操作都使用 _ 后缀。

Note: Any operation that mutates a tensor in-place is post-fixed with an _. For example: x.copy_(y), x.t_(), will change x.

In [16]: x = x.new_ones(5, 3)

In [17]: x
Out[17]:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [19]: y = torch.rand(5, 3)

In [20]: y
Out[20]:
tensor([[0.7964, 0.3754, 0.4626],
        [0.0144, 0.7200, 0.8935],
        [0.8000, 0.6143, 0.1074],
        [0.4493, 0.2590, 0.3288],
        [0.8548, 0.2911, 0.9160]])

In [21]: y.add_(x)
Out[21]:
tensor([[1.7964, 1.3754, 1.4626],
        [1.0144, 1.7200, 1.8935],
        [1.8000, 1.6143, 1.1074],
        [1.4493, 1.2590, 1.3288],
        [1.8548, 1.2911, 1.9160]])

调整 shape

与其他框架不同，PyTorch 中使用 x.size() 来查看 shape，用 x.view() 来调整 shape。也支持 .shape() 和 .reshape()。

In [22]: x = torch.randn(4,4)

In [23]: x
Out[23]:
tensor([[-0.5509, -1.7114, -0.7114,  0.7159],
        [ 1.0975,  0.8329,  0.6897,  1.1666],
        [ 0.4866, -0.3604,  0.2418,  0.6123],
        [ 0.7343,  0.2889,  0.3440, -0.4988]])

In [24]: y = x.view(16)

In [25]: y
Out[25]:
tensor([-0.5509, -1.7114, -0.7114,  0.7159,  1.0975,  0.8329,  0.6897,  1.1666,
         0.4866, -0.3604,  0.2418,  0.6123,  0.7343,  0.2889,  0.3440, -0.4988])

In [26]: y.size()
Out[26]: torch.Size([16])

In [27]: z = x.view(-1, 8)

In [28]: z
Out[28]:
tensor([[-0.5509, -1.7114, -0.7114,  0.7159,  1.0975,  0.8329,  0.6897,  1.1666],
        [ 0.4866, -0.3604,  0.2418,  0.6123,  0.7343,  0.2889,  0.3440, -0.4988]])

In [29]: z.size()
Out[29]: torch.Size([2, 8])

NumPy Array 与 PyTorch Tensor 互换

tensor 转换为 array：使用 .numpy()

In [30]: a = torch.ones(5)

In [31]: a
Out[31]: tensor([1., 1., 1., 1., 1.])

In [32]: b = a.numpy()

In [33]: b
Out[33]: array([1., 1., 1., 1., 1.], dtype=float32)

且在 a 的基础上修改，b 的值也会变化：

In [34]: a.add_(a)
Out[34]: tensor([2., 2., 2., 2., 2.])

In [35]: b
Out[35]: array([2., 2., 2., 2., 2.], dtype=float32)

array 转换成 tensor：使用 .from_numpy()

In [36]: import numpy as np

In [37]: a = np.ones(5)

In [38]: a
Out[38]: array([1., 1., 1., 1., 1.])

In [39]: b = torch.from_numpy(a)

In [40]: b
Out[40]: tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

然后需要注意的是：除了字符 tensor 之外，CPU 上的所有 tensors 都支持转换为 NumPy 的 array。

CUDA Tensors

使用 .to 可以将 tensor 移动到任何设备上（GPU 或者 CPU）。

确认 CUDA 是否可用：

In [41]: torch.cuda.is_available()
Out[41]: True

将设备定义为 cuda：

In [42]: device = torch.device("cuda")

直接在 cuda 上创建变量：

In [43]: y = torch.ones_like(x, device=device)

In [44]: y
Out[44]:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], device='cuda:0')

自动求导机制

PyTorch 中神经网络的核心就在于自动求导机制，即 autograd 这个包。

torch.Tensor 是 autograd 这个包的核心类：

如果将 torch.Tensor 的属性 .requires_grad 设置为 True，那么将会追踪基于这个 tensor 的所有的操作。当完成计算后，你可以调用 .backend() 来自动计算所有的梯度。然后所有关于该 tensor 的梯度都会被累积在 .grad 这个属性中。
想停止追踪一个 tensor，可以调用 .detach() 将其从计算历史中脱离出来，并防止未来的计算被追踪。
要防止跟踪历史记录（和使用内存），还可以使用 torch.no_grad(): 包装代码块，这在评估模型时，可能特别有用，因为模型可能具有 requires_grad = True 的可训练参数，但我们不需要改变梯度。

还有 Function 这个类对自动梯度的实现非常重要：

Tensor 和 Function 互联并建立一个非周期的计算图，然后编码成一个完整的计算历史。每个张量都有一个 .grad_fn 属性，该属性引用已创建 Tensor 的 Function（除了用户创建的 Tensors - 它们的 grad_fn 为 None）。
如果想要计算导数，可以在一个 tensor 上调用 .backend() 。如果该 tensor 是一个标量，则不需要为 .backend()指定任何的参数；如果不是标量，则需要为 .backend() 指定 gradient 这个参数，且该参数和 tensor 的形状需要匹配。

# 设定 requires_grad = True
In [45]: x = torch.ones(2, 2, requires_grad=True)

In [46]: x
Out[46]:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

# 原地修改 requires_grad 属性
In [55]: a = torch.randn(2, 2)

In [56]: a = ((a*3)/(a-1))

In [57]: print(a.requires_grad)
False

In [58]: a.requires_grad_(True)
Out[58]:
tensor([[ 1.4352, -1.8208],
        [ 0.7062, 15.4992]], requires_grad=True)
In [59]: a.requires_grad
Out[59]: True

# 求梯度
In [60]: x = torch.ones(2, 2, requires_grad=True)

In [61]: x
Out[61]:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

In [62]: y = x + 2

In [63]: y
Out[63]:
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

In [64]: print(y.grad_fn)
<AddBackward0 object at 0x000001BED14C2D30>

In [65]: z = y * y * 3

In [66]: out = z.mean()

In [67]: out
Out[67]: tensor(27., grad_fn=<MeanBackward1>)

In [68]: out.backward()

In [69]: print(x.grad)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

# 使用 with torch.no_grad() 关闭 requires_grad
In [70]: x.requires_grad
Out[70]: True

In [71]: (x ** 2).requires_grad
Out[71]: True

In [72]: with torch.no_grad():
    ...:     print((x ** 2).requires_grad)
    ...:
False

搭建网络的 4 种方式

要搭建的网络结构：卷积 -> ReLU -> Pool -> 全连接层 -> ReLU -> 全连接层

import torch
import torch.nn.functional as F
from collections import OrderedDict

方式 1：按照官方的入门教程来搭建网络。

# Method 1

class Net1(torch.nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.dense1 = torch.nn.Linear(32*3*3, 128)
        self.dense2 = torch.nn.Linear(128, 10)
        
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = x.view(x,size(0), -1)
        x = F.relu(self.dense1(x))
        x = self.dense2(x)
        return x
    
print("Method 1:")
model1 = Net1()
print(model1)


Method 1:
Net1(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)

方式 2：使用 Sequential() 来快速搭建网络，但是每层网络的编号是默认的数字，不容易区分。

# Method 2

class Net2(torch.nn.Module):
    
    def __init__(self):
        super(Net2, self).__init__()
        self.conv = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 3, 1, 1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2))
        self.dense = torch.nn.Sequential(
            torch.nn.Linear(32*3*3, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 10))
        
    def forward(self, x):
        conv_out = self.conv1(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense(res)
        return out

print("Method 2:")
model2 = Net2()
print(model2)


Method 2:
Net2(
  (conv): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense): Sequential(
    (0): Linear(in_features=288, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=10, bias=True)
  )
)

方式 3：在方式 2 的基础上，使用 .add_module() 来添加每层网络层结构，这样就可以为每层都单独添加一个名字。

# Method 3

class Net3(torch.nn.Module):
    
    def __init__(self):
        super(Net3, self).__init__()
        self.conv = torch.nn.Sequential()
        self.conv.add_module("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv.add_module("relu1", torch.nn.ReLU())
        self.conv.add_module("pool1", torch.nn.MaxPool2d(2))
        self.dense = torch.nn.Sequential()
        self.dense.add_module("dense1", torch.nn.Linear(32*3*3, 128))
        self.dense.add_module("relu2", torch.nn.ReLU())
        self.dense.add_module("dense2", torch.nn.Linear(128, 10))
        
    def forward(self, x):
        conv_out = self.conv1(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense(res)
        return out

print("Module 3:")
model3 = Net3()
print(model3)


Module 3:
Net3(
  (conv): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)

方式 4：采用字典的形式添加每层额昂罗并设置单独的名字。

# Method 4

class Net4(torch.nn.Module):
    
    def __init__(self):
        super(Net4, self).__init__()
        self.conv = torch.nn.Sequential(
            OrderedDict(
                [
                    ("conv1", torch.nn.Conv2d(3, 32, 3, 1, 1)),
                    ("relu1", torch.nn.ReLU()),
                    ("pool", torch.nn.MaxPool2d(2))
                ])
        )
        self.dense = torch.nn.Sequential(
            OrderedDict(
                [
                    ("dense1", torch.nn.Linear(32*3*3, 128)),
                    ("relu2", torch.nn.ReLU()),
                    ("dense2", torch.nn.Linear(128, 10))
                ])
        )
    
    def forward(self, x):
        conv_out = self.conv(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense(res)
        return out

print("Method 4:")
model4 = Net4()
print(model4)


Method 4:
Net4(
  (conv): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)

加载数据

import torch.utils.data as Data

X_train, y_train = torch.from_numpy(X_train), torch.from_numpy(y_train)
X_test, y_test = torch.from_numpy(X_test), torch.from_numpy(y_test)
y_train = y_train.data.long()
# one-hot encoding
y_train = torch.zeros(60000, 10).scatter_(1, y_train, 1)

train_dataset = Data.TensorDataset(X_train, y_train) # make a training dataset
test_dataset = Data.TensorDataset(X_test, y_test) # make a test dataset

trainloader = Data.DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2)
testloader = Data.DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=2)

参考

[1]. PyTorch - Deep Learning with PyTorch: A 60 Minute Blitz
[2]. Pytorch之搭建神经网络的四种方法