event
PhD Defense by Geonhwa Jeong
Primary tabs
Title: Structured Sparsity-Aware Hardware-Software Co-Design for Deep Neural Network Acceleration
Date: Tuesday, Apr 2, 2024
Time: 1:00 PM – 3:00 PM ET
Location: Klaus 3126
Virtual: Click here to join the meeting
Geonhwa Jeong
Ph.D. Candidate
School of Computer Science
College of Computing
Georgia Institute of Technology
Committee:
Dr. Tushar Krishna (advisor), School of Electrical and Computer Engineering & School of Computer Science, Georgia Institute of Technology
Dr. Hyesoon Kim, School of Computer Science, Georgia Institute of Technology
Dr. Vivek Sarkar, School of Computer Science, Georgia Institute of Technology
Dr. Christopher Hughes, Parallel Computing Lab, Intel Labs
Dr. Joel Emer, Department of Electrical Engineering and Computer Science, MIT / Architecture Research Group, NVIDIA
Abstract
In diverse areas including, but not limited to, computer vision, natural language processing, and personal recommendation, Deep Neural Networks (DNNs) have shown dramatic performance, even exceeding that of humans for some tasks. While widely used in various applications, DNNs are known for their high computational demands, motivating enhancements to hardware and software to improve performance and energy efficiency. Using various types of sparsity in DNNs has been proposed recently to reduce compute and memory requirements, but finding the proper target sparsity to meet both HW and SW requirements is still an active area of research. In this work, we develop HW-SW co-design methods to accelerate various DNNs leveraging structured sparsity.
We first present RASA, an efficient register-aware systolic array, as a matrix engine. We develop techniques to divide an execution stage into several sub-stages and overlap instructions to hide overheads and run them concurrently. Second, we present VEGETA, a flexible structured sparse matrix engine extending a dense matrix engine with flexible structured sparsity support. In addition, we show how VEGETA engines can be used for different sparsity granularities, such as network-wise, layer-wise, and tile-wise. Next, we propose TASD, an approximation method to decompose an unstructured sparse tensor using a sequence of structured sparse tensors. We also show how TASD can be applied to accelerate the execution of both dense and sparse DNNs using structured sparse matrix engines. Finally, we introduce SDQ using sparsification and quantization together, which complement each other, through structured decomposition to accelerate Large Language Models on structured sparse HW.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:03/27/2024
- Modified By:Tatianna Richardson
- Modified:03/27/2024
Categories
Keywords
Target Audience