event

PhD Defense | Less is More: Accelerating Vision by Eliminating Redundancy

Primary tabs

Daniel Bolya - Machine Learning PhD Student - School of Interactive Computing

Date: April 12th

Time: 4:00 PM – 5:30 PM ET

Location: Coda C1115 Druid Hills

Meeting Link: https://gatech.zoom.us/j/96608837820?pwd=cGtSOXZMaHRVL0g0ZGN2aE9QeTNaZz09

Committee

Judy Hoffman (Advisor), School of Interactive Computing

James Hays, School of Interactive Computing

Zsolt Kira, School of Interactive Computing

Dhruv Batra, School of Interactive Computing

Christoph Feichtenhofer, FAIR, Meta

Abstract

The massive models that power today’s state-of-the-art in computer vision require trillions of floating-point operations to compute. But how much of these operations do we really need? Given how well techniques like pruning or quantization work, it’s clear that a lot of this computation is redundant. My work focuses on speeding up vision models by reducing redundancy with simple but powerful techniques. In this thesis defense, I’ll give a brief overview of all of my work and then hone in on discussing Token Merging to merge redundant tokens in vision transformers for classification and diffusion and Hiera, which removes redundant modules in modern vision architectures by explicitly teaching spatial bias. Then, I'll show that you can combine these and other approaches for a multiplicative effect (for e.g., ~10x speed-up on video).

Groups

Status

  • Workflow Status:Published
  • Created By:shatcher8
  • Created:04/03/2024
  • Modified By:shatcher8
  • Modified:04/03/2024

Categories

Keywords

  • No keywords were submitted.