爱因斯坦求和

2019-07-12 本文已影响8人 readilen

Numpy库、pytorch库和tensorflow库中，存在一个非常通用但鲜为人知的函数，称为einsum()，它根据爱因斯坦的求和约定执行求和。PyTorch和TensorFlow像numpy支持einsum的好处之一是einsum可以用于神经网络架构的任意计算图，并且可以反向传播。这是一个高效的符号计算，用于矩阵的各种求和操作，在本教程文章中，我们揭开了einsum()的神秘面纱。

学习它的目的

爱因斯坦求和提供了一种紧凑而优雅的方式来指定几乎任何标量/矢量/矩阵/张量的求和操作。非常普遍，又减少计算机科学家所犯错误的数量，并减少他们花在推理线性代数上的时间。通过同时更清晰，更明确，更自我表示，更具说明性和更少认知负担来实现。它比矩阵乘法这样的优点在于它让用户不必考虑：

提供参数张量的正确顺序
适用于参数张量的正确换位
确保正确的张量尺寸相互排列
正确的换位应用于结果张量
爱因斯坦求和确实以著名的物理学家和理论家阿尔伯特爱因斯坦的名字命名。但是，爱因斯坦没有参与其发展。他只是通过表达式的来推广它。在给Tullio Levi-Civita的一封信中，共同开发人员与Ricci演算的Gregorio Ricci-Curbastro一起（其求和符号只是其中的一部分），爱因斯坦写道：

I admire the elegance of your method of computation; it must be nice to ride through these fields upon the horse of true mathematics while the like of us have to make our way laboriously on foot.

爱因斯坦本人也高度赞扬这个符号求值
NB: As a further aside, the most general formulation of Einstein summation involves topics such as covariance and contravariance, indicated by subscript and superscript indices respectively. For our purposes, we will ignore co-/contravariance, since we can and will choose the “basis” we operate in to make the complexities that they introduce disappear.

einsum工作原理

一旦掌握爱因斯坦求和公式你会非常方便使用。

导入库

import tensorflow as tf
import numpy as np
import torch

它使用格式字符串和任意数量的参数张量，并返回结果张量。

使用

格式化字符串语法：

逗号表示分隔参数，参数规范的数量和参数需要匹配
结果和参数的分割使用箭头，箭头是必须有的
参数和结果张量的规范是一系列（字母，ASCII）字符
张量规格中的字符数正好等于此张量的维数。

语法

示例如下：

v = np.arange(100)
M = np.arange(16).reshape(4,4)
A = np.arange(25).reshape(5,5)
B = np.arange(20).reshape(5,4)
s = np.einsum('a->', v)
T = np.einsum('ij->ji', M)
C = np.einsum('mn,np->mp', A,B)

assert v.ndim == len('a')
assert s.ndim == len('')
assert M.ndim == len('ij')
assert T.ndim == len('ji')
assert A.ndim == len('mn')
assert B.ndim == len('np')
assert C.ndim == len('mp')

工作机理

内部工作
在爱因斯坦求和Numpy.einsum()中，用一个字母标记每个张量的每个轴，该字母表示在该轴上迭代时将使用的索引。然后，einsum()很容易表达为一组深层嵌套的for循环。这些for循环的核心是参数乘积的总和。例子如下

矩阵转置

import torch
a = torch.arange(24).reshape(4, 6)
torch.einsum('ij->ji', [a])
tensor([[ 0,  6, 12, 18],
        [ 1,  7, 13, 19],
        [ 2,  8, 14, 20],
        [ 3,  9, 15, 21],
        [ 4, 10, 16, 22],
        [ 5, 11, 17, 23]])

矩阵求和

a = torch.arange(6).reshape(2, 3)
torch.einsum('ij->', [a])
tensor(15)

矩阵列求和

a = torch.arange(6).reshape(2, 3)
torch.einsum('ij->j', [a])
tensor([3, 5, 7])

矩阵行求和

a = torch.arange(6).reshape(2, 3)
b = torch.arange(3)
torch.einsum('ik,k->i', [a, b])
tensor([ 5, 14])

点乘

a = torch.arange(6).reshape(2, 3)
b = torch.arange(15).reshape(3, 5)
torch.einsum('ik,kj->ij', [a, b])
tensor([[ 25,  28,  31,  34,  37],
        [ 70,  82,  94, 106, 118]])

2D矩阵抽取

2D矩阵迹抽取

a = torch.arange(9).reshape(3, 3)
torch.einsum('ii->i', a)
tensor([0, 4, 8])

2D矩阵迹

a = torch.arange(9).reshape(3, 3)
torch.einsum('ii->', a)
tensor(12)

二次形式

batch矩阵相乘

批外积

image.png

a = torch.randn(3,2,5)
b = torch.randn(3,5,3)
torch.einsum('ijk,ikl->ijl', [a, b])

点积

a = torch.arange(3)
b = torch.arange(3,6)  # [3, 4, 5]
torch.einsum('i,i->', [a, b])

一个MPL示例

# 15: MLP Backprop done easily (stochastic version).
#     h = sigmoid(Wx + b)
#     y = softmax(Vh + c)
Ni = 784
Nh = 500
No =  10
 
W  = np.random.normal(size = (Nh,Ni))  # Nh x Ni
b  = np.random.normal(size = (Nh,))    # Nh
V  = np.random.normal(size = (No,Nh))  # No x Nh
c  = np.random.normal(size = (No,))    # No
 
# Load x and t...
x, t  = train_set[k]
 
# With a judicious, consistent choice of index labels, we can
# express fprop() and bprop() extremely tersely; No thought
# needs to be given about the details of shoehorning matrices
# into np.dot(), such as the exact argument order and the
# required transpositions.
#
# Let
#
#     'i' be the input  dimension label.
#     'h' be the hidden dimension label.
#     'o' be the output dimension label.
#
# Then
 
# Fprop
ha    = np.einsum("hi, i -> h", W, x) + b
h     = sigmoid(ha)
ya    = np.einsum("oh, h -> o", V, h) + c
y     = softmax(ya)
 
# Bprop
dLdya = y - t
dLdV  = np.einsum("h , o -> oh", h, dLdya)
dLdc  = dLdya
dLdh  = np.einsum("oh, o -> h ", V, dLdya)
dLdha = dLdh * sigmoidgrad(ha)
dLdW  = np.einsum("i,  h -> hi", x, dLdha)
dLdb  = dLdha

TreeQN

我曾经在实现TreeQN（ arXiv:1710.11417）的等式6时使用了einsum：给定网络层l上的低维状态表示z_l，和激活a上的转换函数W^a，我们想要计算残差链接的下一层状态表示。

TreeQN

在实践中，我们想要高效地计算大小为B的batch中的K维状态表示Z ∈ ℝB × K，并同时计算所有转换函数（即，所有激活A）。我们可以将这些转换函数安排为一个张量W ∈ ℝA × K × K，并使用einsum高效地计算下一层状态表示。

import torch.nn.functional as F

def random_tensors(shape, num=1, requires_grad=False):
  tensors = [torch.randn(shape, requires_grad=requires_grad) for i in range(0, num)]
  return tensors[0] if num == 1 else tensors

# 参数
# -- [激活数 x 隐藏层维度]
b = random_tensors([5, 3], requires_grad=True)
# -- [激活数 x 隐藏层维度 x 隐藏层维度]
W = random_tensors([5, 3, 3], requires_grad=True)

def transition(zl):
    # -- [batch大小 x 激活数 x 隐藏层维度]
    return zl.unsqueeze(1) + F.tanh(torch.einsum("bk,aki->bai", [zl, W]) + b)

# 随机取样仿造输入
# -- [batch大小 x 隐藏层维度]
zl = random_tensors([2, 3])

transition(zl)

注意力

# 参数
# -- [隐藏层维度]
bM, br, w = random_tensors([7], num=3, requires_grad=True)
# -- [隐藏层维度 x 隐藏层维度]
WY, Wh, Wr, Wt = random_tensors([7, 7], num=4, requires_grad=True)

# 注意力机制的单次应用
def attention(Y, ht, rt1):
    # -- [batch大小 x 隐藏层维度]
    tmp = torch.einsum("ik,kl->il", [ht, Wh]) + torch.einsum("ik,kl->il", [rt1, Wr])
    Mt = F.tanh(torch.einsum("ijk,kl->ijl", [Y, WY]) + tmp.unsqueeze(1).expand_as(Y) + bM)
    # -- [batch大小 x 序列长度]
    at = F.softmax(torch.einsum("ijk,k->ij", [Mt, w]))
    # -- [batch大小 x 隐藏层维度]
    rt = torch.einsum("ijk,ij->ik", [Y, at]) + F.tanh(torch.einsum("ij,jk->ik", [rt1, Wt]) + br)
    # -- [batch大小 x 隐藏层维度], [batch大小 x 序列维度]
    return rt, at

# 取样仿造输入
# -- [batch大小 x 序列长度 x 隐藏层维度]
Y = random_tensors([3, 5, 7])
# -- [batch大小 x 隐藏层维度]
ht, rt1 = random_tensors([3, 7], num=2)

rt, at = attention(Y, ht, rt1)

爱因斯坦求和

学习它的目的

einsum工作原理

导入库

工作机理

矩阵转置

矩阵求和

矩阵列求和

矩阵行求和

点乘

2D矩阵抽取

2D矩阵迹

二次形式

batch矩阵相乘

点积

一个MPL示例

TreeQN

注意力

猜你喜欢

热点阅读