event

PhD Defense by Ali Hassani

Primary tabs

Title: Neighborhood Attention: Fast and Flexible Sparse Attention

 

Ali Hassani

Ph.D. Student in Computer Science

School of Interactive Computing

Georgia Institute of Technology

alihassanijr.com

 

Date: Wednesday, January 7th, 2026

Time: 13:00-15:00 EST

 

Location: Coda C1115 Druid Hills

Remote option (Zoom):

    https://gatech.zoom.us/j/92667338016

    Meeting ID: 926 6733 8016

 

Committee:

Dr. @Shi, Humphrey (Advisor) - School of Interactive Computing, Georgia Institute of Technology

Dr. @Hwu, Wen-mei - Electrical & Computer Engineering, University of Illinois at Urbana-Champaign

Dr. @Goyal, Kartik - School of Interactive Computing, Georgia Institute of Technology

Dr. @Hoffman, Judy - School of Interactive Computing, Georgia Institute of Technology

Dr. @Kira, Zsolt - School of Interactive Computing, Georgia Institute of Technology

 

 

Abstract:

Attention is at the heart of most foundational AI models, across tasks and modalities.

In many of those cases, it incurs a significant amount of computation, which is quadratic

in complexity, and often cited as one of its greatest limitations. As a result, many sparse

approaches have been proposed to alleviate this issue, with one of the most common

approaches being masked or reduced attention span.

In this work, we revisit sliding window approaches, which were commonly believed to

be inherently inefficient, and we propose a new framework called Neighborhood Attention

(NA). Through it, we solve design flaws in the original sliding window attention works, at-

tempt to implement the approach efficiently for modern hardware accelerators, specifically

GPUs, and conduct experiments that highlight the strengths and weaknesses of these 

approaches. At the same time, we bridge the parameterization and properties of

Convolution and Attention, by showing that NA exhibits inductive biases and receptive fields

similar to that in convolutions, while still capable of capturing inter-dependencies, both short

and long range, similar to attention.

We then show the necessity for and challenges that arise from infrastructure, especially

in the context of modern implementations such as Flash Attention, and develop even more

efficient and performance-optimized implementations for NA, specifically for the most re-

cent and popular AI hardware accelerators, the NVIDIA Hopper and Blackwell GPUs.

We build models based on the NA family, highlighting its superior quality and efficiency

compared to existing approaches, and also plug NA into existing foundational models,

and showing that it can accelerate those models by up to 1.6× end-to-end and without

further training, and up to 2.6× end-to-end with training. We further demonstrate that our

methodology can actually create sparse Attention patterns that realize the theoretical limit

of their speedups.

This work is open-sourced through the NATTEN project at natten.org.

 

 

Thesis PDF: https://alihassanijr.com/files/Hassani-Dissertation-2025-10-11.pdf

 

Status

  • Workflow status: Published
  • Created by: Tatianna Richardson
  • Created: 12/15/2025
  • Modified By: Tatianna Richardson
  • Modified: 12/15/2025

Categories

Keywords

Target Audience