event

PhD Defense by Amey Agrawal

Primary tabs

 

Title: Towards Efficient and Predictable Large-Scale AI Systems

Date: Friday, March 20, 2026 Time: 2:00 PM – 4:00 PM EST

Location: Klaus Advanced Computing Building (KACB), Room 3100.

 

Candidate:

Amey Agrawal, School of Computer Science, Georgia Tech

 

Committee:

Dr. Alexey Tumanov (Advisor & Chair), School of Computer Science, Georgia Tech

Dr. Vijay Ganesh, School of Computer Science, Georgia Tech

Dr. Tushar Krishna, School of Electrical and Computer Engineering, Georgia Tech

Dr. Ram Ramjee, Partner Research Manager, Microsoft Research

Dr. Srinivas Sridharan, Distinguished Engineer, NVIDIA

 

Abstract: Serving large AI models efficiently while guaranteeing low and predictable latency is the central systems challenge in deploying modern AI. This thesis addresses this challenge through two complementary thrusts. First, we build inference systems that maximize hardware utilization by exploiting the unique properties of these workloads — resolving latency-throughput tradeoffs, scaling to multi-million token contexts, optimizing across memory hierarchies, and enabling efficient data movement across distributed components. Second, we develop deployment optimization systems that identify optimal configurations by jointly reasoning about model architecture, hardware capabilities, workload characteristics, and user requirements for cost and latency. Together, these contributions achieve multi-fold improvements in serving capacity and latency, with core techniques adopted by major open-source inference frameworks serving millions of GPU-hours weekly.

 

Status

  • Workflow status: Published
  • Created by: Tatianna Richardson
  • Created: 03/05/2026
  • Modified By: Tatianna Richardson
  • Modified: 03/05/2026

Categories

Keywords

User Data

Target Audience