What is node2vec and how to unde

2019-01-13  本文已影响0人  数据小新手

What is node2vec and how to understand?

Recently, i met some problems to deal with graph data. Graph data can be used in may field, such as social network, Knowledge Graph and so on. During the processing, i find node2vec is to good method to embedding model. Before reading this, you may need to know what is word2vec,and what is embedding.

In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. And it define a flexible notion of a node's network neighborhood and design a biased random walk procedure.

The algorithm follows two principles:

1.ability to learn representations that embed nodes from the same network community closely together

2.to learn representations where nodes that share similar roles have similar embeddings.

The key contribution is in defining a flexible notion of a node’s network neighborhood by developing a family of biased random walks, which efficiently explore diverse neighborhoods of a given node.

Algorithm

View the feature learning in networks as a maximum like-
lihood optimization problem. Let G = ( V , E ) be a given net-
work. The analysis is general and applies to any (un)directed,
(un)weighted network. Let f : V → \R^d be the mapping func-
tion from nodes to feature representaions we aim to learn for a
downstream prediction task. Here d is a parameter specifying the
number of dimensions of our feature representation.Equivalently,
f is a matrix of size |V| × d parameters. For every source node
u ∈ V , we define N_S (u) ⊂ V as a network neighborhood of node
u generated through a neighborhood sampling strategy S.

Objective function:
max \sum_{u\in V} logPr(Ns(u)|f(u))
Assumptions:

1.conditional independence, the likelihood of observing a neighborhood node is independent of observing any other neighborhood node given the feature representation of the source:
Pr(Ns(u)|f(u)) = \prod_{n_i \in Ns(u)} Pr(n_i|f(u))
2.symmetry in feature space.A source node and neighbor-
hood node have a symmetric effect over each other in fea-
ture space. Every source neighborhood node pair as a softmax unit parametrized by a dot product of their features.
Pr(n_i|f(u)) = \frac{exp(f(n_i) · f(u))}{\sum_{v∈V} exp(f(v) · f(u))}.
With hte above assumptions the objective:
max \sum_{u\in V}[-logZ_u+\sum_{n_i\in Ns(u)}f(n_i)f(u)]
where Z =\sum_{u\in V}exp(f(u)f(v))

A flexible neighborhood sampling strategy (a flexible
biased random walk procedure ) which allows us to smoothly interpolate between BFS and DFS.

Random walk:

Let ci denote the ith node in the walk, starting with c_0 = u. Nodes c_i are generated by the following distribution:
P(c_i =x|c_{i−1} =v)= \frac{\pi_{vx}}{Z}\ if\ (v, x) ∈ E

Search bias \alpha:

Define a 2nd order random walk with two parameters p and q
which guide the walk.The walk now needs to decide on the next step so it evaluates the transition probabilities π_vx on edges (v, x) leading from v.
\alpha_{pq}(t,x) = \begin{equation} \left\{ \begin{array}{**lr**} \frac1p\ if \ d_{tx} =0 \\ 1 \ d_{tx} =1\\ \frac1q\ if \ d_{tx} =2 \end{array} \right. \end{equation}
d_{tx} denotes the shortest path distance between nodes t and x.

p,q control how fast the walk explores and leaves the neighborhood of starting node u.

[图片上传失败...(image-1ab7d2-1547355447786)]

Return parameter p:Parameter p controls the likelihood of immediately revisiting a node in the walk.

Setting it to a high value (> max(q, 1)) ensures that we are less likely to sample an already- visited node in the following two steps (unless the next node in the walk had no other neighbor). This strategy encourages moderate exploration and avoids 2-hop redundancy in sampling.

On the other hand, if p is low (< min(q, 1)), it would lead the walk to
backtrack a step and this would keep the walk “local”
close to the starting node u.

In-out parameter q :Parameter q allows the search to differentiate
between “inward” and “outward” nodes.

If q > 1, the random walk is biased towards nodes close to node t. Such walks obtain a local view of the underlying graph with respect to the start node in the walk and approximate BFS behavior in the sense that our samples comprise of nodes within a small locality.

If q < 1, the walk is more inclined to visit nodes which are further away from the node t. Such behavior is reflective of DFS which encourages outward exploration.

image-20190113112155305.png

The transition probabilities π_vx for the 2nd order Markov chain can be precomputed and hence, sampling of nodes while simulating the random walk can be done efficiently in O(1) time using alias sampling.

Reference:

  1. Grover A, Leskovec J. node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016: 855-864.

Other related algo:

deepwalks

Line

上一篇下一篇

猜你喜欢

热点阅读