{"689723":{"#nid":"689723","#data":{"type":"event","title":"PhD Defense by Muhammed Fatih Bal\u0131n","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle:\u0026nbsp;\u003C\/strong\u003EComposable Algorithms for Scalable GNN Training\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EDate:\u0026nbsp;\u003C\/strong\u003EThursday, April 16, 2026\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ETime:\u003C\/strong\u003E\u0026nbsp;11:00 AM -- 1:00 PM EST\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ELocation:\u003C\/strong\u003E\u0026nbsp;Coda C1315 Grant Park\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EVirtual Meeting:\u0026nbsp;\u003C\/strong\u003E\u003Ca href=\u0022https:\/\/nam12.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fgatech.zoom.us%2Fj%2F97947665601%3Fpwd%3Dz59yutkDgLkCbxX5972yfTUG3LwCSU.1%26from%3Daddon\u0026amp;data=05%7C02%7Ctm186%40gtvault.onmicrosoft.com%7C688262fdbaed4f09f74c08de96d100aa%7C482198bbae7b4b258b7a6d7f32faa083%7C1%7C0%7C639114021591527940%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C\u0026amp;sdata=GM49SKGhB%2Fr%2Bhk1IGd%2B52Nc11Zdj9H17CCJACrd1VUU%3D\u0026amp;reserved=0\u0022 title=\u0022https:\/\/gatech.zoom.us\/j\/97947665601?pwd=z59yutkDgLkCbxX5972yfTUG3LwCSU.1\u0026amp;from=addon\u0022\u003E\u003Cstrong\u003EZoom\u003C\/strong\u003E\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EMeeting ID:\u0026nbsp;\u003C\/strong\u003E979 4766 5601\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EPasscode:\u0026nbsp;\u003C\/strong\u003E609215\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EMuhammed Fatih Bal\u0131n\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003ECS Ph.D. Candidate\u003C\/p\u003E\u003Cp\u003ESchool of Computational Science and Engineering\u003C\/p\u003E\u003Cp\u003ECollege of Computing\u003C\/p\u003E\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\u003Cp\u003E\u003Ca href=\u0022https:\/\/nam12.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fmfbal.in%2F\u0026amp;data=05%7C02%7Ctm186%40gtvault.onmicrosoft.com%7C688262fdbaed4f09f74c08de96d100aa%7C482198bbae7b4b258b7a6d7f32faa083%7C1%7C0%7C639114021591556940%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C\u0026amp;sdata=1R7bCASY1UScFArIJxxR57M56xCKS39N%2F3xzuL62Tf0%3D\u0026amp;reserved=0\u0022 title=\u0022https:\/\/mfbal.in\u0022\u003Emfbal.in\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ECommittee\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EDr. \u00dcmit V. \u00c7ataly\u00fcrek (Advisor) -- School of Computational Science and Engineering, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Yunan Luo -- School of Computational Science and Engineering, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Bo Dai -- School of Computational Science and Engineering, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Pan Li -- School of Electrical and Computer Engineering, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003EDr. Yingyan (Celine) Lin -- School of Computer Science, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EAbstract\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003ETraining Graph Neural Networks (GNN) at a large scale requires significant computational resources.\u003C\/p\u003E\u003Cp\u003EOne of the most effective ways to reduce resource requirements is minibatch training coupled with\u003C\/p\u003E\u003Cp\u003Egraph sampling. Existing methods encounter a significant bottleneck to scaling due to the Neighborhood\u003C\/p\u003E\u003Cp\u003EExplosion Phenomenon (NEP). This thesis focuses on designing composable algorithms aimed at mitigating\u003C\/p\u003E\u003Cp\u003Ethe effects of NEP and presents a GNN training framework incorporating optimized implementations\u003C\/p\u003E\u003Cp\u003Eof these algorithms.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EFirst, we present a novel sampling algorithm LAyer-neighBOR (LABOR), which is designed to be a\u003C\/p\u003E\u003Cp\u003Edirect replacement for Neighbor Sampling (NS) with the same fan-out hyperparameter while sampling up\u003C\/p\u003E\u003Cp\u003Eto 7x fewer vertices, without sacrificing quality. LABOR is designed so that the variance of the\u003C\/p\u003E\u003Cp\u003Eestimator for each vertex is aligned with NS in the context of an individual vertex. Moreover, under\u003C\/p\u003E\u003Cp\u003Ethe same vertex sampling budget constraints, LABOR converges faster than existing layer sampling\u003C\/p\u003E\u003Cp\u003Eapproaches and can use up to 112x larger batch size compared to NS.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ESecond, we present Cooperative Minibatching. We leverage the observation that the size of the sampled\u003C\/p\u003E\u003Cp\u003Esubgraph is a sublinear and concave function of the batch size, leading to noteworthy reductions in\u003C\/p\u003E\u003Cp\u003Ethe workload executed per seed vertex as the batch sizes increase. Hence, it is favorable for processors\u003C\/p\u003E\u003Cp\u003Eequipped with a fast interconnect to work on a large minibatch together instead of working on separate\u003C\/p\u003E\u003Cp\u003Esmaller minibatches. We also show how to take advantage of the same phenomenon in serial execution\u003C\/p\u003E\u003Cp\u003Eby proposing Dependent Consecutive Minibatches. Our experimental evaluations demonstrate that\u003C\/p\u003E\u003Cp\u003Eincreasing this dependency results in up to a fourfold reduction in bandwidth requirements for fetching\u003C\/p\u003E\u003Cp\u003Evertex embeddings, without negatively impacting model convergence.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EThird, we present \u003Ca href=\u0022https:\/\/nam12.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.dgl.ai%2Fdgl_docs%2Fapi%2Fpython%2Fdgl.graphbolt.html\u0026amp;data=05%7C02%7Ctm186%40gtvault.onmicrosoft.com%7C688262fdbaed4f09f74c08de96d100aa%7C482198bbae7b4b258b7a6d7f32faa083%7C1%7C0%7C639114021591578701%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C\u0026amp;sdata=OOL54Nibb%2BiQ%2BWEemPrytPcAjh5%2BZtEOZ%2BCKcVx%2BHcc%3D\u0026amp;reserved=0\u0022 title=\u0022https:\/\/www.dgl.ai\/dgl_docs\/api\/python\/dgl.graphbolt.html\u0022\u003EGraphBolt\u003C\/a\u003E, an end-to-end GPU-accelerated data loading framework for large-scale\u003C\/p\u003E\u003Cp\u003EGNN training. GraphBolt integrates LABOR sampling and Cooperative Minibatching within a composable\u003C\/p\u003E\u003Cp\u003Epipeline built on PyTorch DataPipes, alongside system-level optimizations including software pipelining\u003C\/p\u003E\u003Cp\u003Evia asynchronous futures, dynamic multi-level feature caching across GPU, CPU, and SSD tiers,\u003C\/p\u003E\u003Cp\u003Eincremental graph caching, and GPU-accelerated sampling kernels with edge-parallel load balancing.\u003C\/p\u003E\u003Cp\u003EGraphBolt requires no preprocessing for its caches and supports heterogeneous graphs, temporal graphs,\u003C\/p\u003E\u003Cp\u003Enode classification, and link prediction through shared, composable implementations.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EAll of the presented algorithms above are composable, meaning that the savings increase\u003C\/p\u003E\u003Cp\u003Ein a multiplicative manner as we combine these approaches. GraphBolt demonstrates this by combining\u003C\/p\u003E\u003Cp\u003Eall of these techniques in a unified framework with optimized implementations, achieving significantly\u003C\/p\u003E\u003Cp\u003Ehigher training throughput than existing frameworks such as DGL and PyG on large-scale GNN training\u003C\/p\u003E\u003Cp\u003Eworkloads.\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EComposable Algorithms for Scalable GNN Training\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Composable Algorithms for Scalable GNN Training"}],"uid":"27707","created_gmt":"2026-04-13 18:28:49","changed_gmt":"2026-04-13 18:29:24","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2026-04-16T11:00:00-04:00","event_time_end":"2026-04-16T13:00:00-04:00","event_time_end_last":"2026-04-16T13:00:00-04:00","gmt_time_start":"2026-04-16 15:00:00","gmt_time_end":"2026-04-16 17:00:00","gmt_time_end_last":"2026-04-16 17:00:00","rrule":null,"timezone":"America\/New_York"},"location":"Coda C1315 Grant Park","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}