<node id="686893">
  <nid>686893</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1765820268</created>
  <changed>1765820310</changed>
  <title><![CDATA[PhD Defense by  Ali Hassani]]></title>
  <body><![CDATA[<p><strong>Title:</strong>&nbsp;Neighborhood Attention: Fast and Flexible Sparse Attention</p><p>&nbsp;</p><p><strong>Ali Hassani</strong></p><p>Ph.D. Student in Computer Science</p><p>School of Interactive Computing</p><p>Georgia Institute of Technology</p><p><a href="https://alihassanijr.com" target="_blank" title="https://alihassanijr.com">alihassanijr.com</a></p><p>&nbsp;</p><p><strong>Date:</strong>&nbsp;Wednesday, January 7th, 2026</p><p><strong>Time:</strong>&nbsp;13:00-15:00 EST</p><p>&nbsp;</p><p><strong>Location:</strong>&nbsp;Coda C1115 Druid Hills</p><p>Remote option (Zoom):</p><p>    <a href="https://gatech.zoom.us/j/92667338016">https://gatech.zoom.us/j/92667338016</a></p><p>    Meeting ID: 926 6733 8016</p><p>&nbsp;</p><p><strong>Committee:</strong></p><p>Dr. <a href="mailto:shi@gatech.edu" id="OWAAM258731">@Shi, Humphrey</a>&nbsp;(Advisor)&nbsp;- School of Interactive Computing, Georgia Institute of Technology</p><p>Dr. <a href="mailto:w-hwu@illinois.edu" id="OWAAM518120">@Hwu, Wen-mei</a>&nbsp;- Electrical &amp; Computer Engineering, University of Illinois at Urbana-Champaign</p><p>Dr. <a href="mailto:kartikgo@gatech.edu" id="OWAAM212107">@Goyal, Kartik</a>&nbsp;- School of Interactive Computing, Georgia Institute of Technology</p><p>Dr. <a href="mailto:judy@gatech.edu" id="OWAAM846771">@Hoffman, Judy</a>&nbsp;- School of Interactive Computing, Georgia Institute of Technology</p><p>Dr. <a href="mailto:zkira@gatech.edu" id="OWAAM641284">@Kira, Zsolt</a>&nbsp;- School of Interactive Computing, Georgia Institute of Technology</p><p>&nbsp;</p><p>&nbsp;</p><p><strong>Abstract:</strong></p><p>Attention is at the heart of most foundational AI models, across tasks and modalities.</p><p>In many of those cases, it incurs a significant amount of computation, which is quadratic</p><p>in complexity, and often cited as one of its greatest limitations. As a result, many sparse</p><p>approaches have been proposed to alleviate this issue, with one of the most common</p><p>approaches being masked or reduced attention span.</p><p>In this work, we revisit sliding window approaches, which were commonly believed to</p><p>be inherently inefficient, and we propose a new framework called Neighborhood Attention</p><p>(NA). Through it, we solve design flaws in the original sliding window attention works, at-</p><p>tempt to implement the approach efficiently for modern hardware accelerators, specifically</p><p>GPUs, and conduct experiments that highlight the strengths and weaknesses of these&nbsp;</p><p>approaches. At the same time, we bridge the parameterization and properties of</p><p>Convolution and Attention, by showing that NA exhibits inductive biases and receptive fields</p><p>similar to that in convolutions, while still capable of capturing inter-dependencies, both short</p><p>and long range, similar to attention.</p><p>We then show the necessity for and challenges that arise from infrastructure, especially</p><p>in the context of modern implementations such as Flash Attention, and develop even more</p><p>efficient and performance-optimized implementations for NA, specifically for the most re-</p><p>cent and popular AI hardware accelerators, the NVIDIA Hopper and Blackwell GPUs.</p><p>We build models based on the NA family, highlighting its superior quality and efficiency</p><p>compared to existing approaches, and also plug NA into existing foundational models,</p><p>and showing that it can accelerate those models by up to 1.6× end-to-end and without</p><p>further training, and up to 2.6× end-to-end with training. We further demonstrate that our</p><p>methodology can actually create sparse Attention patterns that realize the theoretical limit</p><p>of their speedups.</p><p>This work is open-sourced through the NATTEN project at natten.org.</p><p>&nbsp;</p><p>&nbsp;</p><p><strong>Thesis PDF:</strong> <a href="https://alihassanijr.com/files/Hassani-Dissertation-2025-10-11.pdf">https://alihassanijr.com/files/Hassani-Dissertation-2025-10-11.pdf</a></p><p>&nbsp;</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Neighborhood Attention: Fast and Flexible Sparse Attention]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>Neighborhood Attention: Fast and Flexible Sparse Attention</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2026-01-07T13:00:00-05:00]]></value>
      <value2><![CDATA[2026-01-07T15:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Coda C1115 Druid Hills]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
