event

PhD Defense by Shreyas Malakarjun Patil

Primary tabs

Title: Leveraging sparsity in deep neural networks for training efficiency, interpretability and generalization

 

Date: November 18th

Time: 1:00 PM

Location: Conference room (Atlantic) on Coda 9th floor

Virtual attendancehttps://gatech.zoom.us/j/92275630265

 

Shreyas Malakarjun Patil

Machine Learning PhD Student

ECE

Georgia Institute of Technology

 

Committee

1. Dr. Constantine Dovrolis, Professor, Computer Science, Georgia Tech  (Advisor)

2. Dr. Ling Liu, Professor, Computer Science, Georgia Tech

3. Dr. Zsolt Kira, Associate Professor, Interactive Computing, Georgia Tech

4. Dr. Yingyan (Celine) Lin, Associate Professor, Computer Science, Georgia Tech

5. Dr. Loizos Michael, Associate Professor, Open University of Cyprus

 

Abstract

Sparse neural networks (Sparse NNs) are characterized by having fewer connections between consecutive layers compared to traditional fully connected, or dense NNs. Historically, sparsity has been studied post-training to enhance inference efficiency and as a regularization mechanism to improve generalization. However, additional benefits beyond these areas remain underexplored. In this thesis, we investigate sparse NNs, various sparsity patterns, and their broader benefits.

 

First, we introduce PHEW (Path with Higher Edge-Weights), a novel method for constructing sparse NNs at initialization, without using any training data. PHEW does not make any task-specific assumptions; instead, it exploits structural properties inherent in dense NNs that promote faster convergence and better generalization, thereby reducing the computational burden of training dense NNs. 

 

Second, we propose Neural Sculpting, a technique to uncover the underlying hierarchically modular task structure within NNs. Neural Sculpting uses an iterative process of pruning both units and edges during training, followed by network analysis to detect functional modules and infer hierarchical relationships between them. This method enhances the interpretability of NNs by guiding them to reflect the task’s inherent structure.

 

Finally, we leverage structural information about the task’s hierarchical modularity to enhance NN performance by aligning the architecture at initialization with the task’s structure. Specifically, we compare architectures ranging from monolithic dense NNs, which assume no prior knowledge, to hierarchically modular NNs with shared modules, which leverage sparsity, modularity, and module reusability. Incorporating modularity and module reuse significantly enhances learning efficiency and generalization, particularly in data-scarce scenarios.

 

In conclusion, this thesis demonstrates that sparse NNs offer not only enhanced training and inference efficiency but also superior interpretability and generalization capabilities.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/05/2024
  • Modified By:Tatianna Richardson
  • Modified:11/07/2024

Categories

Keywords

Target Audience