AGNN论文笔记

2019-11-17 本文已影响0人第一个读书笔记

AGNN: Attention-based Graph Neural Network for Semi-supervised learning (Mar 2018)

Background

These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully connected layer.

AGNN：

通过移除intermediate fully connected layer，降低模型参数。在半监督学习任务中，标签数据比较少，这种方式可以为设计innovative的传播层留有更多余地；
在传播层中使用attention mechanisms，动态调节局部信息，以此获得更高的准度。

Contribution

使用a linear classifier of multinomial logistic regression，移除了intermediate non-linear activation layers，只保留了图神经网络中的邻接线性传播，这种模型结果可以得到与最好的图模型媲美的结果，同时也表明了图上邻接信息聚合的重要性；
基于attention mechanisms：
1. 降低模型复杂度，在每个intermediate layer，只有一个scalar parameter
2. Discover dynamically and adaptively which nodes are relevant to the target node for clarification

Model

节点特征向量： $X = [X_1, ...,X_n]$ ， $X_i ∈ R^{d_x}$
标签： $Y_i$
有标签的子集： $L ⊂V$
假设邻近节点更有可能具有相同的标签，损失： $L(X,Y_L) = L_{label}(X_L, Y_L) + λL_G(X)$

有监督损失： $L_{label} = \sum_{i∈L}l(Y_i,f(X_i))$
拉普拉斯正则化： $L_G = \sum_{(i,j)∈E}||f(X_i)-f(X_j)||^2$

邻接矩阵： $A ∈\{0,1\}^{n×n}$
目标函数： $Z = F(X,A) ∈ R^{n×d_y}$ ，来预测每个节点所属标签，其中：
Z_{ic}：节点i数据标签c的概率

Propagation Layer

第t层的隐层： $H^t ∈ R^{n×d_h}$
传播矩阵： $P∈ R^{n×n}$
传播层： $\tilde H^t = PH^t$ ，可以是局部平均或者是随机游走：

Random walk：
$P = D^{-1} = diag(A1)^{-1}$
$\tilde H_i^{t} = (1/|N(i)|)\sum_jH_j^t$

单层传播: $H^{t+1} =σ(\tilde H^tW^t)$

GCN

Is a special case of GNN which stacks two layers of specific propagation and perceptron:

$H^1 = ReLU((PX))W^0)$
$Z = f(X,A) = softmax((PH^1)W^1) = softmax((PReLU((PX))W^0))W^1)$
其中：
$P = \tilde D^{-1/2}\tilde A\tilde D^{-1/2}$
$\tilde A = A+I$
$\tilde D = diag(\tilde A1)$
损失： $L = -\sum_{i∈L} \sum_{c=1}^{d_y}Y_{ic}lnZ_{ic}$

GLN

将GCN的intermediate非线性激活移除，就是GLN:
$Z = f(X,A) = softmax((P^2X)W^0W^1)$
The two propagation layers simply take linear local average of the raw features weighted by their degrees, and at the output layer, a simple linear classifier( multinomial logistics regression) is applied.

AGCN

GCN的层与层的传播是不变（static）的，并不会考虑到节点的状态（adaptive propagation）。
比如：P_{i,j} = 1 /\sqrt[2]{|N(i)|N(j)}，无法知道哪个邻接的节点与分类的节点更有关。
Embedding_layer: $H^1 = ReLU(XW^0)$
Attention-guided propagation layers: $H^{t+1} = P^tH^t$
Output row-vector of node i: $H_i^{t+1} = \sum_{j∈N(i)} P_{i,j}^tH_j^t$
其中：
$P_i^t = softmax([β^tcos(H_i^t,H_j^t)]_{j∈N(i)})$
$cos(x,y) = x^Ty/||x||||y||, ||x||_{l^2}$
Attention from node j to node I is: $P_{ij}^t = (1/C)e^{β^tcos(H_i^t,H_j^t)}$
Add self loop in propagation:
$Z = f(X,A) = softmax(H^{l+1}W^1)$