event

PhD Defense by Daniel Bolya

Primary tabs

Title: Less is More: Accelerating Vision by Eliminating Redundancy

 

Date: April 12th, 2024

Time: 4:00pm - 5:30pm ET

Location: CODA C1115 Druid Hills

Zoom Link: https://gatech.zoom.us/j/96608837820?pwd=cGtSOXZMaHRVL0g0ZGN2aE9QeTNaZz09

 

Daniel Bolya

Machine Learning PhD Student

Interactive Computing
Georgia Institute of Technology

 

Committee

1 Judy Hoffman (Advisor, IC, GT)

2 James Hays (IC, GT)

3 Zsolt Kira (IC, GT)

4 Dhruv Batra (IC, GT)

5 Christoph Feichtenhofer (FAIR, Meta)

 

Abstract

The massive models that power today’s state-of-the-art in computer vision require trillions of floating-point operations to compute. But how much of these operations do we really need? Given how well techniques like pruning or quantization work, it’s clear that a lot of this computation is redundant. My work focuses on speeding up vision models by reducing redundancy with simple but powerful techniques. In this thesis defense, I’ll give a brief overview of all of my work and then hone in on discussing Token Merging to merge redundant tokens in vision transformers for classification and diffusion and Hiera, which removes redundant modules in modern vision architectures by explicitly teaching spatial bias. Then, I'll show that you can combine these and other approaches for a multiplicative effect (for e.g., ~10x speed-up on video).

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/04/2024
  • Modified By:Tatianna Richardson
  • Modified:04/04/2024

Categories

Keywords

Target Audience