MultiSage: Empowering GCN with Contextualized Multi-Embeddings onWeb-Scale Multipartite Networks

Abstract

Existing GCNs mostly work on homogeneous graphs and consider a single embedding for each node, which do not sufficiently model the multi-facet nature and complex interaction of nodes in real world
networks. Here, author present a contextualized GCN engine by modeling the multipartite networks of target nodes and their intermediate context nodes that specify the contexts of their interactions.

Current Issues and problems

  • Nodes are connected in different ways thus close to each other in different ways, which cannot be simultaneously captured by a single embedding.
  • How to find proper context?:The real-world networks are often multipartite, it beneficial to model target nodes and context nodes in multipartite networks, where the interactions between the target nodes can be subtly modeled via the help of context nodes.
  • How to leverage context?:How to leverage context in massive real-world networks to facilitate
    effective and flexible downstream applications.

image-20200813111808135

Preliminaries

Traditional GCN: The output of the l+1l+1 th convolutional layer Hl+1\textbf{H}^{l+1} of GCN is computed as follows

image-20200813112017937

GraphSage: Sample a fixed number of neighbors in each convolution layer and aggregate the neighborhood embedding.

image-20200813113024588

Triplet-wise optimization objective based on max-margin ranking as follows

image-20200813113054830

where δ\delta is a margin parameter, vqv_q and vpv_p are the query and positive nodes, vnv_n is the negative node sampled from Pn(vq)P_n(v_q)

T\mathcal{T}:The number of the nodes type in graph

C\mathcal{C}:Context nodes set

Each uNvTu\in\mathcal{N}_v\subset\mathcal{T} of the ego vTv\in \mathcal{T} is associated with a dominant context node o(v,u)Co \sim(v,u)\in \mathcal{C}

MultiSage

In this work, Author leverage the heterogeneity of real-world networks by separating nodes into two main types: target nodes and context nodes. Our main focus is to learn embeddings of the target nodes, while using the context nodes to describe the relationship between the target nodes.

image-20200813113704204

Assumptions:

  1. minimum domain knowledge is available to separate target nodes from context nodes
  2. most important interactions among target nodes involve context nodes

image-20200813114154211

image-20200813114357968

In (a), when three neighbor pins are aggregated through mean pooling, the resulting neighborhood embedding simply lies in the center of the three pins, reflecting the same influence of all neighbors on the ego. In contrast in (b), two neighbor pins are connected via the fashion board while the other one via the crafts board, thus drawing the neighborhood embedding more into the fashion direction. Such contextualization over the target interaction is desirable, since each neighbor is similar to the ego from a particular perspective, and thus should influence the ego embedding more in the corresponding subspace.

image-20200813124137599

Raw feature transformation

learns to project and transform raw node features via stacked dense neural networks as follows

image-20200813115937275

ztz_t and zcz_c represent the embeddings of the target and the context nodes, respectively.

Contextual masking

Contextual masking allows us to project different ego-neighbor pairs into various embedding subspaces,
so as to emphasize contextualized interaction among target node pairs regarding particular embedding dimensions at the feature level.

Element-wise multiplying the embedding of the context node zc(o)z_c (o) onto the embeddings of both ego and neighbor target nodes zt(v)z_t (v) and zt(u)z_t (u) as follows

ztc=ztzcz_{t|c}=z_t\otimes z_c

Note that, since the last embedding layer of zcz_c is ReLU, certain dimensions can be learned to be zero, which effectively “masks out” irrelevant dimensions, so as to project the target embeddings into particular subspaces directly controlled by the context embeddings

Autho also introduce:

ztc=ztzcz_{t|c}=z_t\oplus z_c

ztc=ReLU(Wp(ztzc)+b)z_{t|c}=ReLU(W_p(z_t \odot z_c)+b)

where \odot means the concatenation.

Contextual attention

Author design a novel contextual attention mechanism to consider the overall impact of different neighbors of ego at the node level, by jointly computing an attention weight for each ego-context-neighbor triplet α(v,o,u)=\alpha (v,o,u)=

image-20200813122053635

where {a,Wat,Wac}\{a,W_{at},W_{ac}\} are the learnable attention parameters. τ\tau is the LeakyReLU activation function.

To further improve the capacity and stability, we exploit multi-head attention to compute the aggregated contextualized embedding as follows:

image-20200813123426905

Training objective

Author directly add the contextualized ego embedding zNv(v)z_{N_v} (v) and neighbor embedding $z_{N_v} (u) $as
the final MultiSage embedding h(v)h(v).

Training loss:

sampling positive neighbor and negative neighbor vp,vnv_p,v_n

image-20200813113054830

image-20200813130113062

这个加法是合理的,如果u,v是邻居,则huh_u=hvh_v,那么乘积就会很大,如果不是邻居,则huh_u 不等于hvh_v乘积会小,则loss是会变大的。

Contextualized random walk

由于每个游走序列会有多个中间节点,但是其实只是有一些中间节点是不重要的,于是对每个ego-target node pair只保留最重要的中间节点。

image-20200813151229871

vis[v]:记录节点访问次数

ctx[v]:记录中间节点和target节点

dom[v]:记录不同中间节点的重要性

line 7: 对所有的访问过的context node 记录增加

line8-15:运行多次random walk,记录出现target node情况下,出现最多的contex node。

Experiment

image-20200813154448878

image-20200813154608159

image-20200813155337550

可以使用mask去获取不同的候选商品,因为mask会剔除一些不同的特征,可以让目标映射到不同的子

image-20200813160743067