event

PhD Defense by Geonhwa Jeong

Primary tabs

Title: Structured Sparsity-Aware Hardware-Software Co-Design for Deep Neural Network Acceleration

Date: Tuesday, Apr 2, 2024

Time: 1:00 PM – 3:00 PM ET

Location: Klaus 3126 

Virtual: Click here to join the meeting

 

Geonhwa Jeong

Ph.D. Candidate

School of Computer Science

College of Computing

Georgia Institute of Technology

 

Committee:

Dr. Tushar Krishna (advisor), School of Electrical and Computer Engineering & School of Computer Science, Georgia Institute of Technology

Dr. Hyesoon Kim, School of Computer Science, Georgia Institute of Technology

Dr. Vivek Sarkar, School of Computer Science, Georgia Institute of Technology

Dr. Christopher Hughes, Parallel Computing Lab, Intel Labs

Dr. Joel Emer, Department of Electrical Engineering and Computer Science, MIT / Architecture Research Group, NVIDIA

 

Abstract

In diverse areas including, but not limited to, computer vision, natural language processing, and personal recommendation, Deep Neural Networks (DNNs) have shown dramatic performance, even exceeding that of humans for some tasks. While widely used in various applications, DNNs are known for their high computational demands, motivating enhancements to hardware and software to improve performance and energy efficiency. Using various types of sparsity in DNNs has been proposed recently to reduce compute and memory requirements, but finding the proper target sparsity to meet both HW and SW requirements is still an active area of research. In this work, we develop HW-SW co-design methods to accelerate various DNNs leveraging structured sparsity.

 

We first present RASA, an efficient register-aware systolic array, as a matrix engine. We develop techniques to divide an execution stage into several sub-stages and overlap instructions to hide overheads and run them concurrently. Second, we present VEGETA, a flexible structured sparse matrix engine extending a dense matrix engine with flexible structured sparsity support. In addition, we show how VEGETA engines can be used for different sparsity granularities, such as network-wise, layer-wise, and tile-wise. Next, we propose TASD, an approximation method to decompose an unstructured sparse tensor using a sequence of structured sparse tensors. We also show how TASD can be applied to accelerate the execution of both dense and sparse DNNs using structured sparse matrix engines. Finally, we introduce SDQ using sparsification and quantization together, which complement each other, through structured decomposition to accelerate Large Language Models on structured sparse HW.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/27/2024
  • Modified By:Tatianna Richardson
  • Modified:03/27/2024

Categories

Keywords

Target Audience