event
PhD Defense | Less is More: Accelerating Vision by Eliminating Redundancy
Primary tabs
Daniel Bolya - Machine Learning PhD Student - School of Interactive Computing
Date: April 12th
Time: 4:00 PM – 5:30 PM ET
Location: Coda C1115 Druid Hills
Meeting Link: https://gatech.zoom.us/j/96608837820?pwd=cGtSOXZMaHRVL0g0ZGN2aE9QeTNaZz09
Committee
Judy Hoffman (Advisor), School of Interactive Computing
James Hays, School of Interactive Computing
Zsolt Kira, School of Interactive Computing
Dhruv Batra, School of Interactive Computing
Christoph Feichtenhofer, FAIR, Meta
Abstract
The massive models that power today’s state-of-the-art in computer vision require trillions of floating-point operations to compute. But how much of these operations do we really need? Given how well techniques like pruning or quantization work, it’s clear that a lot of this computation is redundant. My work focuses on speeding up vision models by reducing redundancy with simple but powerful techniques. In this thesis defense, I’ll give a brief overview of all of my work and then hone in on discussing Token Merging to merge redundant tokens in vision transformers for classification and diffusion and Hiera, which removes redundant modules in modern vision architectures by explicitly teaching spatial bias. Then, I'll show that you can combine these and other approaches for a multiplicative effect (for e.g., ~10x speed-up on video).
Groups
Status
- Workflow Status:Published
- Created By:shatcher8
- Created:04/03/2024
- Modified By:shatcher8
- Modified:04/03/2024
Categories
Keywords