<node id="689723">
  <nid>689723</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1776104929</created>
  <changed>1776104964</changed>
  <title><![CDATA[PhD Defense by Muhammed Fatih Balın]]></title>
  <body><![CDATA[<p><strong>Title:&nbsp;</strong>Composable Algorithms for Scalable GNN Training</p><p>&nbsp;</p><p><strong>Date:&nbsp;</strong>Thursday, April 16, 2026</p><p><strong>Time:</strong>&nbsp;11:00 AM -- 1:00 PM EST</p><p><strong>Location:</strong>&nbsp;Coda C1315 Grant Park</p><p>&nbsp;</p><p><strong>Virtual Meeting:&nbsp;</strong><a href="https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgatech.zoom.us%2Fj%2F97947665601%3Fpwd%3Dz59yutkDgLkCbxX5972yfTUG3LwCSU.1%26from%3Daddon&amp;data=05%7C02%7Ctm186%40gtvault.onmicrosoft.com%7C688262fdbaed4f09f74c08de96d100aa%7C482198bbae7b4b258b7a6d7f32faa083%7C1%7C0%7C639114021591527940%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=GM49SKGhB%2Fr%2Bhk1IGd%2B52Nc11Zdj9H17CCJACrd1VUU%3D&amp;reserved=0" title="https://gatech.zoom.us/j/97947665601?pwd=z59yutkDgLkCbxX5972yfTUG3LwCSU.1&amp;from=addon"><strong>Zoom</strong></a></p><p><strong>Meeting ID:&nbsp;</strong>979 4766 5601</p><p><strong>Passcode:&nbsp;</strong>609215</p><p>&nbsp;</p><p><strong>Muhammed Fatih Balın</strong></p><p>CS Ph.D. Candidate</p><p>School of Computational Science and Engineering</p><p>College of Computing</p><p>Georgia Institute of Technology</p><p><a href="https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmfbal.in%2F&amp;data=05%7C02%7Ctm186%40gtvault.onmicrosoft.com%7C688262fdbaed4f09f74c08de96d100aa%7C482198bbae7b4b258b7a6d7f32faa083%7C1%7C0%7C639114021591556940%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=1R7bCASY1UScFArIJxxR57M56xCKS39N%2F3xzuL62Tf0%3D&amp;reserved=0" title="https://mfbal.in">mfbal.in</a></p><p>&nbsp;</p><p><strong>Committee</strong></p><p>Dr. Ümit V. Çatalyürek (Advisor) -- School of Computational Science and Engineering, Georgia Institute of Technology</p><p>Dr. Yunan Luo -- School of Computational Science and Engineering, Georgia Institute of Technology</p><p>Dr. Bo Dai -- School of Computational Science and Engineering, Georgia Institute of Technology</p><p>Dr. Pan Li -- School of Electrical and Computer Engineering, Georgia Institute of Technology</p><p>Dr. Yingyan (Celine) Lin -- School of Computer Science, Georgia Institute of Technology</p><p>&nbsp;</p><p><strong>Abstract</strong></p><p>Training Graph Neural Networks (GNN) at a large scale requires significant computational resources.</p><p>One of the most effective ways to reduce resource requirements is minibatch training coupled with</p><p>graph sampling. Existing methods encounter a significant bottleneck to scaling due to the Neighborhood</p><p>Explosion Phenomenon (NEP). This thesis focuses on designing composable algorithms aimed at mitigating</p><p>the effects of NEP and presents a GNN training framework incorporating optimized implementations</p><p>of these algorithms.</p><p>&nbsp;</p><p>First, we present a novel sampling algorithm LAyer-neighBOR (LABOR), which is designed to be a</p><p>direct replacement for Neighbor Sampling (NS) with the same fan-out hyperparameter while sampling up</p><p>to 7x fewer vertices, without sacrificing quality. LABOR is designed so that the variance of the</p><p>estimator for each vertex is aligned with NS in the context of an individual vertex. Moreover, under</p><p>the same vertex sampling budget constraints, LABOR converges faster than existing layer sampling</p><p>approaches and can use up to 112x larger batch size compared to NS.</p><p>&nbsp;</p><p>Second, we present Cooperative Minibatching. We leverage the observation that the size of the sampled</p><p>subgraph is a sublinear and concave function of the batch size, leading to noteworthy reductions in</p><p>the workload executed per seed vertex as the batch sizes increase. Hence, it is favorable for processors</p><p>equipped with a fast interconnect to work on a large minibatch together instead of working on separate</p><p>smaller minibatches. We also show how to take advantage of the same phenomenon in serial execution</p><p>by proposing Dependent Consecutive Minibatches. Our experimental evaluations demonstrate that</p><p>increasing this dependency results in up to a fourfold reduction in bandwidth requirements for fetching</p><p>vertex embeddings, without negatively impacting model convergence.</p><p>&nbsp;</p><p>Third, we present <a href="https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dgl.ai%2Fdgl_docs%2Fapi%2Fpython%2Fdgl.graphbolt.html&amp;data=05%7C02%7Ctm186%40gtvault.onmicrosoft.com%7C688262fdbaed4f09f74c08de96d100aa%7C482198bbae7b4b258b7a6d7f32faa083%7C1%7C0%7C639114021591578701%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=OOL54Nibb%2BiQ%2BWEemPrytPcAjh5%2BZtEOZ%2BCKcVx%2BHcc%3D&amp;reserved=0" title="https://www.dgl.ai/dgl_docs/api/python/dgl.graphbolt.html">GraphBolt</a>, an end-to-end GPU-accelerated data loading framework for large-scale</p><p>GNN training. GraphBolt integrates LABOR sampling and Cooperative Minibatching within a composable</p><p>pipeline built on PyTorch DataPipes, alongside system-level optimizations including software pipelining</p><p>via asynchronous futures, dynamic multi-level feature caching across GPU, CPU, and SSD tiers,</p><p>incremental graph caching, and GPU-accelerated sampling kernels with edge-parallel load balancing.</p><p>GraphBolt requires no preprocessing for its caches and supports heterogeneous graphs, temporal graphs,</p><p>node classification, and link prediction through shared, composable implementations.</p><p>&nbsp;</p><p>All of the presented algorithms above are composable, meaning that the savings increase</p><p>in a multiplicative manner as we combine these approaches. GraphBolt demonstrates this by combining</p><p>all of these techniques in a unified framework with optimized implementations, achieving significantly</p><p>higher training throughput than existing frameworks such as DGL and PyG on large-scale GNN training</p><p>workloads.</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Composable Algorithms for Scalable GNN Training]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>Composable Algorithms for Scalable GNN Training</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2026-04-16T11:00:00-04:00]]></value>
      <value2><![CDATA[2026-04-16T13:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Coda C1315 Grant Park]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
