<![CDATA[PhD Defense by Muhammed Fatih Balın]]>

689723 event 1776104929 1776104964 <![CDATA[PhD Defense by Muhammed Fatih Balın]]> Title: Composable Algorithms for Scalable GNN Training

Date: Thursday, April 16, 2026

Time: 11:00 AM -- 1:00 PM EST

Location: Coda C1315 Grant Park

Virtual Meeting: Zoom

Meeting ID: 979 4766 5601

Passcode: 609215

Muhammed Fatih Balın

CS Ph.D. Candidate

School of Computational Science and Engineering

College of Computing

Georgia Institute of Technology

mfbal.in

Committee

Dr. Ümit V. Çatalyürek (Advisor) -- School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Yunan Luo -- School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Bo Dai -- School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Pan Li -- School of Electrical and Computer Engineering, Georgia Institute of Technology

Dr. Yingyan (Celine) Lin -- School of Computer Science, Georgia Institute of Technology

Abstract

Training Graph Neural Networks (GNN) at a large scale requires significant computational resources.

One of the most effective ways to reduce resource requirements is minibatch training coupled with

graph sampling. Existing methods encounter a significant bottleneck to scaling due to the Neighborhood

Explosion Phenomenon (NEP). This thesis focuses on designing composable algorithms aimed at mitigating

the effects of NEP and presents a GNN training framework incorporating optimized implementations

of these algorithms.

First, we present a novel sampling algorithm LAyer-neighBOR (LABOR), which is designed to be a

direct replacement for Neighbor Sampling (NS) with the same fan-out hyperparameter while sampling up

to 7x fewer vertices, without sacrificing quality. LABOR is designed so that the variance of the

estimator for each vertex is aligned with NS in the context of an individual vertex. Moreover, under

the same vertex sampling budget constraints, LABOR converges faster than existing layer sampling

approaches and can use up to 112x larger batch size compared to NS.

Second, we present Cooperative Minibatching. We leverage the observation that the size of the sampled

subgraph is a sublinear and concave function of the batch size, leading to noteworthy reductions in

the workload executed per seed vertex as the batch sizes increase. Hence, it is favorable for processors

equipped with a fast interconnect to work on a large minibatch together instead of working on separate

smaller minibatches. We also show how to take advantage of the same phenomenon in serial execution

by proposing Dependent Consecutive Minibatches. Our experimental evaluations demonstrate that

increasing this dependency results in up to a fourfold reduction in bandwidth requirements for fetching

vertex embeddings, without negatively impacting model convergence.

Third, we present GraphBolt, an end-to-end GPU-accelerated data loading framework for large-scale

GNN training. GraphBolt integrates LABOR sampling and Cooperative Minibatching within a composable

pipeline built on PyTorch DataPipes, alongside system-level optimizations including software pipelining

via asynchronous futures, dynamic multi-level feature caching across GPU, CPU, and SSD tiers,

incremental graph caching, and GPU-accelerated sampling kernels with edge-parallel load balancing.

GraphBolt requires no preprocessing for its caches and supports heterogeneous graphs, temporal graphs,

node classification, and link prediction through shared, composable implementations.

All of the presented algorithms above are composable, meaning that the savings increase

in a multiplicative manner as we combine these approaches. GraphBolt demonstrates this by combining

all of these techniques in a unified framework with optimized implementations, achieving significantly

higher training throughput than existing frameworks such as DGL and PyG on large-scale GNN training

workloads.

]]> Composable Algorithms for Scalable GNN Training

]]> <![CDATA[]]> 221981 1788 100811