PhD Defense by Shreyas Malakarjun Patil

Title: Leveraging sparsity in deep neural networks for training efficiency, interpretability and generalization

Date: November 18th

Time: 1:00 PM

Location: Conference room (Atlantic) on Coda 9th floor

Virtual attendance: https://gatech.zoom.us/j/92275630265

Shreyas Malakarjun Patil

Machine Learning PhD Student

ECE

Georgia Institute of Technology

Committee

1. Dr. Constantine Dovrolis, Professor, Computer Science, Georgia Tech (Advisor)

2. Dr. Ling Liu, Professor, Computer Science, Georgia Tech

3. Dr. Zsolt Kira, Associate Professor, Interactive Computing, Georgia Tech

4. Dr. Yingyan (Celine) Lin, Associate Professor, Computer Science, Georgia Tech

5. Dr. Loizos Michael, Associate Professor, Open University of Cyprus

Abstract

Sparse neural networks (Sparse NNs) are characterized by having fewer connections between consecutive layers compared to traditional fully connected, or dense NNs. Historically, sparsity has been studied post-training to enhance inference efficiency and as a regularization mechanism to improve generalization. However, additional benefits beyond these areas remain underexplored. In this thesis, we investigate sparse NNs, various sparsity patterns, and their broader benefits.

First, we introduce PHEW (Path with Higher Edge-Weights), a novel method for constructing sparse NNs at initialization, without using any training data. PHEW does not make any task-specific assumptions; instead, it exploits structural properties inherent in dense NNs that promote faster convergence and better generalization, thereby reducing the computational burden of training dense NNs.

Second, we propose Neural Sculpting, a technique to uncover the underlying hierarchically modular task structure within NNs. Neural Sculpting uses an iterative process of pruning both units and edges during training, followed by network analysis to detect functional modules and infer hierarchical relationships between them. This method enhances the interpretability of NNs by guiding them to reflect the task’s inherent structure.

Finally, we leverage structural information about the task’s hierarchical modularity to enhance NN performance by aligning the architecture at initialization with the task’s structure. Specifically, we compare architectures ranging from monolithic dense NNs, which assume no prior knowledge, to hierarchically modular NNs with shared modules, which leverage sparsity, modularity, and module reusability. Incorporating modularity and module reuse significantly enhances learning efficiency and generalization, particularly in data-scarce scenarios.

In conclusion, this thesis demonstrates that sparse NNs offer not only enhanced training and inference efficiency but also superior interpretability and generalization capabilities.

Media

No media selected

Summary

Leveraging sparsity in deep neural networks for training efficiency, interpretability and generalization

Details

Monday

Nov 18 2024

01:00pm - 04:00pm

Location: Conference room (Atlantic) on Coda 9th floor

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:11/05/2024
Modified By:Tatianna Richardson
Modified:11/07/2024

Mercury (Hg)