Table of Contents
Master graph neural networks. Complete guide to GNNs, graph convolutions, message passing, and learning on graph-structured data.
Introduction: Graph Neural Networks
Most machine learning assumes data is independent or sequential.
But many real-world systems are fundamentally relational:
- Social networks (people connected by friendships)
- Molecules (atoms connected by bonds)
- Knowledge graphs (entities connected by relationships)
- Transportation networks (cities connected by routes)
- Recommendation systems (users and items connected by interactions)
Traditional neural networks struggle with graphs because:
- Irregular structure (not grid like images, not sequence like text)
- Variable size (graphs have different numbers of nodes)
- Relational dependencies (what matters is connections, not just node features)
Graph Neural Networks (GNNs) handle these challenges by learning to aggregate information from neighbors.
This guide covers GNNs end-to-end: from fundamentals to architectures to applications.
Graph Fundamentals
Graph Components
Nodes: Entities (people, atoms, cities)
Edges: Relationships (friendships, bonds, routes)
Features: Information on nodes/edges
Labels: What we want to predict
Social Network Example:
Nodes: People
Edges: Friendships
Node features: Profile info, age, interests
Labels: Friend request (predict)
Graph Types
Directed vs Undirected:
- Directed: Edges have direction (X follows Y)
- Undirected: Edges bidirectional (X and Y friends)
Weighted vs Unweighted:
- Weighted: Edges have strength (relationship strength)
- Unweighted: Binary (connected or not)
Homogeneous vs Heterogeneous:
- Homogeneous: One type of node, one type of edge
- Heterogeneous: Multiple types (users, products, interactions)
Why Graphs Matter
Real-World Structure
Many systems fundamentally graphs.
Examples:
- Social networks (billions of connections)
- Molecules (atoms and bonds)
- Knowledge bases (facts and relationships)
- Transportation systems (physical networks)
- Recommendation systems (users and items)
Relational Learning
What matters is structure, not just features.
Friendship prediction:
Friend A friends with B, C, D
Want to predict if A and E will be friends
Key: A and E have common friends (B, C, D)
Structure → Prediction
Scalability
Graphs capture complex relationships compactly.
Without graph: Store all pairs (millions for large networks)
With graph: Store edges (thousands, even for large networks)
Efficient representation
Graph Convolutions
Motivation
Images: Convolutional networks work by local neighborhoods
Graphs: Can do similar—aggregate from neighbors
Graph Convolution Idea: Node’s representation = Aggregate neighbors’ representations
Process
Node v has neighbors {u1, u2, u3}
Step 1: Get representations of neighbors
Step 2: Aggregate (sum, mean, max)
Step 3: Apply neural network
Step 4: Node v has new representation
Repeat for multiple layers
Mathematical Formulation
Graph Convolutional Network (GCN):
h_v^(l+1) = σ(W^(l) × mean({h_u^(l) : u ∈ neighbors(v)}))
Where:
h_v = representation of node v at layer l
W = learnable weights
σ = activation function (ReLU, etc.)
Interpretation:
- Gather neighbors’ representations
- Average them (aggregate)
- Apply transformation (W)
- Non-linearity (σ)
Message Passing
General framework for GNNs.
Intuition
Nodes send messages to neighbors, aggregate them.
Node A sends: "Here's my state"
Node B sends: "Here's my state"
Node C sends: "Here's my state"
Node D aggregates: "From neighbors, here's update"
Node D updates based on neighbors
Three Steps
1. Message Generation: Each node creates message for neighbors.
m_uv = Message from u to v = f(h_u, h_v, edge_feature)
2. Aggregation: Node aggregates messages from all neighbors.
agg_v = Aggregate({m_uv : u ∈ neighbors(v)})
Usually sum or mean
3. Update: Node updates its representation.
h_v^(new) = Update(h_v^(old), agg_v)
Usually neural network
Example
Graph: A-B-C (A connected to B, B connected to C)
Layer 1:
A sends to B: m_AB
C sends to B: m_CB
B aggregates: mean(m_AB, m_CB)
B updates: h_B^(1) = σ(W × aggregated)
C sends to B: m_BC
B aggregates: mean(m_AB, m_BC)
C updates: h_C^(1) = σ(W × aggregated)
GNN Architectures
Graph Convolutional Networks (GCN)
Simplest and most common.
h_v^(l+1) = σ(W × mean(h_u^(l) : u ∈ neighbors(v)))
Pros: Simple, works well
Cons: Limited expressiveness, assumes equal neighbor importance
GraphSAGE (Graph SAmpling and aggreGating)
Sample neighbors for scalability.
Instead of using all neighbors, sample K random neighbors
Reduces computation for large graphs
Allows inference on unseen nodes
Graph Attention Networks (GAT)
Learn to weight neighbors by importance.
Instead of equal weighting:
Compute attention weight for each neighbor
Important neighbors: High weight
Unimportant neighbors: Low weight
Aggregate weighted neighbors
Advantage: Learns importance automatically
Disadvantage: More parameters, slower
Spectral Methods
Use graph’s spectral properties (eigenvalues, eigenvectors).
Transform graph to spectral domain
Convolve in spectral domain
Inverse transform
Theoretically grounded but computationally expensive
Node Classification
Predict label for each node.
Example: Predict user’s age group from social network
Input: Social network graph, some users' ages
Task: Predict ages of unlabeled users
Solution: GNN learns from neighbors' ages
Semi-Supervised Learning
Leverage unlabeled data.
Labeled: 100 users out of 1M
Unlabeled: 999,900 users
Standard ML: Ignore unlabeled
GNN: Learn from structure, use unlabeled
Result: Better performance
Training
1. Forward pass: Compute predictions for all nodes
2. Loss: Only labeled nodes contribute
3. Backprop: Update weights
4. Repeat
Unlabeled nodes influence gradient indirectly (through structure)
Link Prediction
Predict if edge should exist.
Example: Predict friend recommendations (will X and Y be friends?)
Input: Current friendship network
Task: Predict future friendships
Solution: GNN computes node representations
Link score: Similarity of representations
High similarity → Likely friends
Methods
Approach 1: Similarity
Link score = similarity(h_X, h_Y)
Cosine, dot product, etc.
Approach 2: Neural Network
Link score = NN(h_X, h_Y)
More expressive
Graph Classification
Classify entire graph (not individual nodes).
Example: Classify molecules as drug candidates
Input: Molecular graph (atoms as nodes, bonds as edges)
Task: Is this a good drug candidate?
Solution: GNN processes entire graph
Global representation: Aggregate all node representations
Classification: Neural network on global representation
Methods
Global Pooling:
1. Compute node representations
2. Aggregate all nodes (sum, mean, max)
3. Classify global representation
Hierarchical Pooling:
1. Compute node representations
2. Select important nodes (pooling layer)
3. Aggregate selected nodes
4. Repeat multiple times
5. Final classification
More sophisticated, captures graph structure
Challenges and Limitations
Over-Smoothing
Deep GNNs suffer from over-smoothing.
Layer 1: Different node representations
Layer 2: Some aggregation
...
Layer 10: All nodes have nearly identical representations
Lost node-specific information
Solution: Skip connections, deeper designs
Scalability
Large graphs challenging.
Millions of nodes, billions of edges
Naive aggregation: O(degree) per node
Solution: Sampling, sparse operations, distributed training
Heterogeneous Graphs
Multiple node/edge types.
Social network: Users, posts, comments, likes
Different message passing for different types
Requires specialized architectures
Key Takeaways
✓ Graphs fundamental structure – Many real-world systems relational
✓ GNNs aggregate from neighbors – Message passing framework
✓ Message passing general – Aggregation, update cycle
✓ Multiple architectures – GCN, GraphSAGE, GAT, etc.
✓ Node classification common – Semi-supervised learning
✓ Link prediction useful – Recommendation, knowledge base completion
✓ Graph classification possible – For whole-graph properties
✓ Scalability challenging – Large graphs need special handling
✓ Over-smoothing problem – Deep GNNs lose information
✓ Active research area – Rapid improvements
Related Articles
- Deep Learning: Neural Networks
- Recommendation Systems: Building Personalized Recommendations
- Knowledge Graphs and Semantic Learning
Frequently Asked Questions
Q: Should I use GNN for my data?
A: If data naturally graph-structured (relational) → Yes. Otherwise, may not help.
Q: Which architecture is best?
A: Depends. GCN simplest, GAT most flexible, GraphSAGE most scalable. Try multiple.
Q: How deep should my GNN be?
A: 2-3 layers usually good. Deeper risks over-smoothing. Use skip connections for depth.
Q: Can GNN handle dynamic graphs?
A: Yes, with modifications. Temporal graph networks, dynamic GNNs exist.
Q: Is GNN better than traditional methods?
A: Often yes for relational data. But simpler methods sometimes sufficient. Try both.

