Tesorflow中计算Pairwise的Euclidean D

2019-06-20 本文已影响0人 Nevrast

问题场景

已知两组向量为：
$\begin{array}{l}{X=\left\{\mathbf{x}_{1}, \mathbf{x}_{2}, \cdots, \mathbf{x}_{n}\right\}} \\ {Y=\left\{\mathbf{y}_{1}, \mathbf{y}_{2}, \cdots, \mathbf{y}_{m}\right\}}\end{array}$
现在要计算 $X$ 中每一个向量和 $Y$ 中每一个向量的欧式距离。

解决思路一

把 $X$ 中向量使用tf.tile复制m次，把 $Y$ 中向量复制n次。
$\left[\begin{array}{cccc}{\mathbf{x}_{1}} & {\mathbf{x}_{1}} & {\cdots} & {\mathbf{x}_{1}} \\ {\mathbf{x}_{2}} & {\mathbf{x}_{2}} & {\cdots} & {\mathbf{x}_{2}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {\mathbf{x}_{n}} & {\mathbf{x}_{n}} & {\cdots} & {\mathbf{x}_{n}}\end{array}\right] - \left[\begin{array}{cccc}{\mathbf{y}_{1}} & {\mathbf{y}_{2}} & {\cdots} & {\mathbf{y}_{m}} \\ {\mathbf{y}_{1}} & {\mathbf{y}_{2}} & {\cdots} & {\mathbf{y}_{m}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {\mathbf{y}_{1}} & {\mathbf{y}_{2}} & {\cdots} & {\mathbf{y}_{m}}\end{array}\right]$
然后按照向量距离公式 $\|\mathbf{x}-\mathbf{y}\|$ 就可以得到 $n\times m$ 的距离矩阵了。
缺点：

计算量大，复杂度大概是 $n\times m \times k$ ， $k$ 为向量维度。
tf.tile后要保存两个很大的矩阵，占资源

解决思路二

利用完全平方公式，对于距离矩阵 $D$ 中的 $D_{ij}$ 元素(表示 $X$ 中第 $i$ 个向量和 $Y$ 中的第 $j$ 个向量之间的欧式距离):
$\begin{aligned} D_{i j} &=\left(\mathbf{x}_{i}-\mathbf{y}_{j}\right)^{2} \\ &=\mathbf{x}_{i}^{2}-2 \mathbf{x}_{i} \mathbf{y}_{j}+\mathbf{y}_{j}^{2} \end{aligned}$
其中的 $\mathbf{x}_{i} \mathbf{y}_{j}$ 矩阵可以由两个向量相乘快速计算。

import tensorflow as tf


def euclidean_dist(x, y):
    square_x = tf.reduce_sum(tf.square(x), axis=-1)
    square_y = tf.reduce_sum(tf.square(y), axis=-1)
    # expand dims for broadcasting
    ex = tf.expand_dims(square_x, axis=-1)
    ey = tf.expand_dims(square_y, axis=-2)
    # XY matrix
    xy = tf.einsum('bij,bkj->bik', x, y)
    # 如果没有batch_size这个维度，可以写成：
    # xy = tf.einsum('ij,kj->ik', x, y)
    # compute distance，浮点防溢出
    dist = tf.sqrt(ex - 2 * xy + ey + 1e-10)
    return dist

计算dist的时候加上了1e-10，因为 $\sqrt{x}$ 在 $x=0$ 处导数不存在，需要做平滑。

优点：

$X$ 和 $Y$ 中向量的平方不用重复计算，加速运算。
$xy$ 矩阵是标量矩阵，节约显存资源，这个有时很重要！

Tesorflow中计算Pairwise的Euclidean D

问题场景

解决思路一

解决思路二

猜你喜欢

热点阅读