event

Ph.D. Proposal Oral Exam - Saeed Rashidi

Primary tabs

Title:  HW/SW Methods for Scalable Training of Deep Learning Models

Committee: 

Dr. Krishna, Advisor      

Dr. Daglis, Chair

Dr. Tumanov

Abstract: The objective of the proposed research is to present novel HW/SW techniques for designing platforms for distributed training of Deep Learning (DL) models. DL applications are becoming an integral part of our society due to their vast application in different domains such as vision, language processing, recommendation systems, speech processing, etc. Before being deployed, DL models need to be trained using training samples over many iterations to reach the desired accuracy. To improve the accuracy, DL models are constantly growing in size and training samples, making the tasks of training extremely challenging, taking months or even years for a given model to be trained. Distributed training aims to improve the training speed by distributing the training task across many accelerators. However, distributed training introduces new overheads, such as communication overhead that can limit scalability if left unaddressed.

Status

  • Workflow Status:Published
  • Created By:Daniela Staiculescu
  • Created:11/29/2021
  • Modified By:Daniela Staiculescu
  • Modified:11/29/2021

Categories

Target Audience