event

Ph.D. Dissertation Defense - Saeed Rashidi

Primary tabs

TitleHW/SW Methods for Scalable Training of Deep Learning Models

Committee:

Dr. Tushar Krishna, ECE, Chair, Advisor

Dr. Alexandros Daglis, CoC

Dr. Alexey Tumanov, CoC

Dr. Srinivas Sridharan, Meta

Dr. Zhihao Jia, CMU

Abstract: The objective of the proposed thesis is to present novel HW/SW techniques for designing platforms for distributed training of Deep Learning (DL) models. DL applications are becoming an integral part of our society due to their vast application in different domains such as vision, language processing, recommendation systems, speech processing, etc. Before being deployed, DL models need to be trained using training samples over many iterations to reach the desired accuracy. To improve the accuracy, DL models are constantly growing in size and training samples, making the tasks of training extremely challenging, taking months or even years for a given model to be trained. Distributed training aims to improve the training speed by distributing the training task across many accelerators. However, distributed training introduces new overheads, such as communication overhead that can limit scalability if left unaddressed.

Status

  • Workflow Status:Published
  • Created By:Daniela Staiculescu
  • Created:11/23/2022
  • Modified By:Daniela Staiculescu
  • Modified:11/23/2022

Categories

Target Audience