event

PhD Defense by Mikhail Isaev

Primary tabs

Title: Methodologies for co-designing supercomputer-scale systems and deep learning software

 

Mikhail Isaev

Computer Science PhD Student

School of Computational Science and Engineering

Georgia Institute of Technology

 

Date: Tuesday, Feb 20, 2024

Time: 13:00 – 15:00 EST

Location: C1015 (Vinings) and Teams

 

Committee:

Dr. Richard W. Vuduc (advisor), School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Jeffrey Young, School of Computer Science, Georgia Institute of Technology

Dr. Tushar Krishna, School of Electrical and Computer Engineering, Georgia Institute of Technology

Dr. Hyesoon Kim, School of Computer Science, Georgia Institute of Technology

Dr. Nicholas G. McDonald, Nvidia Research

 

Abstract: 

This dissertation introduces new methodologies to co-design deep learning software and supercomputer hardware for large-scale training.

 

The first is an analytical performance model to co-design large language models (LLM) and supercomputer architectures during the early phases of the system design process. On the algorithm side, we consider diverse implementation strategies, including data, tensor, and pipeline parallelism, communication-computation overlap, and memory optimization. The hardware aspect includes hierarchical memory systems, multiple interconnection networks, and parameterized efficiencies based on operation size. We implement it in Calculon, an open-source tool that allows estimating performance for billions of strategy and architecture combinations. This facilitates co-design-space exploration for future LLMs with trillions of parameters, yielding insights into optimal system characteristics and the interplay between algorithmic and architectural decisions.

 

As models exceed 100 trillion parameters, memory capacity and network speed become critical bottlenecks. For the former, Calculon suggests adding slower, high-capacity memory to store all intermediate tensors and model parameters while utilizing faster memory solely for current computation. For the latter, we present novel distributed-memory parallel matrix multiplication algorithms capable of hiding communication entirely, potentially achieving perfect scaling.

 

Looking ahead, we foresee a need to model artificial intelligence (AI) applications beyond LLMs and perform detailed system simulations in later design stages. Our second open-source tool, ParaGraph, translates compiled parallel programs into high-level graphs for emulator-based dynamic execution in network simulation environments. Case studies on deep learning workloads extracted from JAX and TensorFlow programs illustrate ParaGraph's utility for software-hardware co-design workflows, including communication optimization and hardware bottleneck identification.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:02/05/2024
  • Modified By:Tatianna Richardson
  • Modified:02/05/2024

Categories

Keywords

Target Audience