# How powerful is Graph Neural Networks?

GNNs follow a neighborhood aggregation scheme, where the representation vector of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes.

There are two tasks of interest:

(1) Node classiﬁcation

(2) Graph classiﬁcation

x Formally, the k-th layer of a GNN is

In the pooling variant of GraphSAGE, AGGREGATE has been formulated as

and COMBINE could be concatenation followed by a linear mapping

Graph Convolutional Networks (GCN) - the element-wise *mean* pooling is used. AGGREGATE and COMBINE step

the READOUT function aggregates node features from the ﬁnal iteration to obtain the entire graph’s representation $h_G$

**Deﬁnition1 (Multiset).** A multiset is a generalized concept of a set that allows multiple instances for its elements. More formally, a multiset is a 2-tuple $X = (S,m)$ where $S$ is the underlying set of $X$ that is formed from its distinct elements, and $m : S \rightarrow \mathbb{N}_{\geq 1}$ gives the multiplicity of the elements.

**Lemma2**. Let $G_1$ and $G_2$ be any two non-isomorphic graphs. If a graph neural network $A : G→\mathbb{R}^d $ maps $G_1$ and $G_2$ to different embeddings, the Weisfeiler-Lehman graph isomorphism test also decides $G_1$ and $G_2$ are not isomorphic.

**Theorem 3**. Let $A : G → \mathbb{R}^d$ be a GNN. With a sufﬁcient number of GNN layers, A maps any graphs $G_1$ and $G_2$ that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to different embeddings if the following conditions hold:

a) $A$ aggregates and updates node features iteratively with

where the functions $f$, which operates on multisets, and $φ$ are injective.

b) $A$’s graph-level readout, which operates on the multiset of node features ${h_{v}^{(k)}}$, is injective.

**Lemma4**. Assume the input feature space $\mathcal{X}$ is *countable*. Let $g^{(k)}$ be the function parameterized by a GNN’s k-th layer for $k = 1,…,L$, where $g^{(1)}$ is deﬁned on multisets $X \subset \mathcal{X}$ of bounded size. The range of $g^{(k)}$, i.e., the space of node hidden features ${h_{v}^{(k)}}$, is also countable for all $k = 1,…,L$.

*countable*: If a set A has the same cardinality as $\mathbb{N}$, then we say that A is *countable*.

**Lemma5**. Assume $\mathcal{X}$ is countable. There exists a function $f : \mathcal{X} →\mathbb{R}^n$ so that $h(X) =\sum _{x\in X} f(x)$ is unique for each multiset $X \subset \mathcal{X}$ of bounded size. Moreover, any multiset function $g$ can be decomposed as $g (X) = \varphi(\sum _{x\in X} f(x))$ for some function $\varphi$.

**Corollary 6**. Assume $\mathcal{X}$ is countable. There exists a function $f : \mathcal{X} →\mathbb{R}^n$ so that for inﬁnitely many choices of $\epsilon$, including all irrational numbers, $h(c,X) = (1 + \epsilon) · f(c) + \sum_{x \in X} f(x)$ is unique for each pair $(c,X)$, where $c \in \mathcal{X}$ and $X \subset \mathcal{X}$ is a multiset of bounded size. Moreover, any function $g$ over such pairs can be decomposed as $g (c,X) = \varphi((1 + \epsilon)·f(c) + \sum_{x\in X} f(x))$ for some function $\varphi$.

## GIN - GRAPH ISOMORPHISM NETWORK

To update node representation -

node learnt can be directly used for tasks like **node classification and link prediction**.

Readout function for graph classiﬁcation tasks. Given embeddings of individual nodes, readout function produces the embedding of the entire graph.

## GNN - GRAPH NEURAL NETWORK

GNN do not satisfy the conditions in Theorem 3, and we conduct ablation studies in

*An ablation study typically refers to removing some “feature” of the model or algorithm, and seeing how that affects performance.

(1) 1-layer perceptrons instead of MLPs

We are interested in understanding whether 1-layer perceptrons are enough for graph learning.

**Lemma 7.** There exist ﬁnite multisets $X1 \neq X2$ so that for any linear mapping $W$, $\sum_{x\in X_1} ReLU(Wx) =\sum_{x\in X_2} ReLU(Wx). $

(2) mean or max-pooling instead of the sum.

Mean learns distributions, and max-pooling learns sets with distinct elements.

Consider $X_1 = (S,m)$ and $X_2 = (S,k ·m)$, where $X_1$ and $X_2$ have the same set of distinct elements, but $X_2$ contains $k$ copies of each element of $X_1$.

**Corollary 8.** Assume $X$ is countable. There exists a function $f : \mathcal{X} → \mathbb{R^n}$ so that for $h(X) = \frac{1}{|X|}\sum{x\in X} f(x), h(X_1) = h(X_2)$ if and only if multisets $X_1$ and $X_2$ have the same distribution. That is, assuming $|X_2|\geq|X_1|$, we have $X_1 = (S,m)$ and $X_2 = (S,k\cdot m)$ for some $k \in \mathbb{N}_{\geq 1}.$

**Corollary 9**. Assume $X$ is countable. Then there exists a function $f : X → \mathbb{R}^{\infty}$ so that for $h(X) = max_{x\in X} f(x), h(X_1) = h(X_2)$ if and only if $X_1$ and $X_2$ have the same underlying set.