How powerful is Graph Neural Networks?

How powerful is Graph Neural Networks?

GNNs follow a neighborhood aggregation scheme, where the representation vector of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes.

There are two tasks of interest:

(1) Node classification

(2) Graph classification

x Formally, the k-th layer of a GNN is

In the pooling variant of GraphSAGE, AGGREGATE has been formulated as

and COMBINE could be concatenation followed by a linear mapping

Graph Convolutional Networks (GCN) - the element-wise mean pooling is used. AGGREGATE and COMBINE step

the READOUT function aggregates node features from the final iteration to obtain the entire graph’s representation $h_G$

Definition1 (Multiset). A multiset is a generalized concept of a set that allows multiple instances for its elements. More formally, a multiset is a 2-tuple $X = (S,m)$ where $S$ is the underlying set of $X$ that is formed from its distinct elements, and $m : S \rightarrow \mathbb{N}_{\geq 1}$ gives the multiplicity of the elements.

Lemma2. Let $G_1$ and $G_2$ be any two non-isomorphic graphs. If a graph neural network $A : G→\mathbb{R}^d $ maps $G_1$ and $G_2$ to different embeddings, the Weisfeiler-Lehman graph isomorphism test also decides $G_1$ and $G_2$ are not isomorphic.

Theorem 3. Let $A : G → \mathbb{R}^d$ be a GNN. With a sufficient number of GNN layers, A maps any graphs $G_1$ and $G_2$ that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to different embeddings if the following conditions hold:

a) $A$ aggregates and updates node features iteratively with

where the functions $f$, which operates on multisets, and $φ$ are injective.

b) $A$’s graph-level readout, which operates on the multiset of node features ${h_{v}^{(k)}}$, is injective.

Lemma4. Assume the input feature space $\mathcal{X}$ is countable. Let $g^{(k)}$ be the function parameterized by a GNN’s k-th layer for $k = 1,…,L$, where $g^{(1)}$ is defined on multisets $X \subset \mathcal{X}$ of bounded size. The range of $g^{(k)}$, i.e., the space of node hidden features ${h_{v}^{(k)}}$, is also countable for all $k = 1,…,L$.

countable: If a set A has the same cardinality as $\mathbb{N}$, then we say that A is countable.

Lemma5. Assume $\mathcal{X}$ is countable. There exists a function $f : \mathcal{X} →\mathbb{R}^n$ so that $h(X) =\sum _{x\in X} f(x)$ is unique for each multiset $X \subset \mathcal{X}$ of bounded size. Moreover, any multiset function $g$ can be decomposed as $g (X) = \varphi(\sum _{x\in X} f(x))$ for some function $\varphi$.

Corollary 6. Assume $\mathcal{X}$ is countable. There exists a function $f : \mathcal{X} →\mathbb{R}^n$ so that for infinitely many choices of $\epsilon$, including all irrational numbers, $h(c,X) = (1 + \epsilon) · f(c) + \sum_{x \in X} f(x)$ is unique for each pair $(c,X)$, where $c \in \mathcal{X}$ and $X \subset \mathcal{X}$ is a multiset of bounded size. Moreover, any function $g$ over such pairs can be decomposed as $g (c,X) = \varphi((1 + \epsilon)·f(c) + \sum_{x\in X} f(x))$ for some function $\varphi$.

GIN - GRAPH ISOMORPHISM NETWORK

To update node representation -

node learnt can be directly used for tasks like node classification and link prediction.

Readout function for graph classification tasks. Given embeddings of individual nodes, readout function produces the embedding of the entire graph.

GNN - GRAPH NEURAL NETWORK

GNN do not satisfy the conditions in Theorem 3, and we conduct ablation studies in

*An ablation study typically refers to removing some “feature” of the model or algorithm, and seeing how that affects performance.

(1) 1-layer perceptrons instead of MLPs

We are interested in understanding whether 1-layer perceptrons are enough for graph learning.

Lemma 7. There exist finite multisets $X1 \neq X2$ so that for any linear mapping $W$, $\sum_{x\in X_1} ReLU(Wx) =\sum_{x\in X_2} ReLU(Wx). $

(2) mean or max-pooling instead of the sum.

Mean learns distributions, and max-pooling learns sets with distinct elements.

Consider $X_1 = (S,m)$ and $X_2 = (S,k ·m)$, where $X_1$ and $X_2$ have the same set of distinct elements, but $X_2$ contains $k$ copies of each element of $X_1$.

Corollary 8. Assume $X$ is countable. There exists a function $f : \mathcal{X} → \mathbb{R^n}$ so that for $h(X) = \frac{1}{|X|}\sum{x\in X} f(x), h(X_1) = h(X_2)$ if and only if multisets $X_1$ and $X_2$ have the same distribution. That is, assuming $|X_2|\geq|X_1|$, we have $X_1 = (S,m)$ and $X_2 = (S,k\cdot m)$ for some $k \in \mathbb{N}_{\geq 1}.$

Corollary 9. Assume $X$ is countable. Then there exists a function $f : X → \mathbb{R}^{\infty}$ so that for $h(X) = max_{x\in X} f(x), h(X_1) = h(X_2)$ if and only if $X_1$ and $X_2$ have the same underlying set.