event
PhD Defense by Amey Agrawal
Primary tabs
Title: Towards Efficient and Predictable Large-Scale AI Systems
Date: Friday, March 20, 2026 Time: 2:00 PM – 4:00 PM EST
Location: Klaus Advanced Computing Building (KACB), Room 3100.
Candidate:
Amey Agrawal, School of Computer Science, Georgia Tech
Committee:
Dr. Alexey Tumanov (Advisor & Chair), School of Computer Science, Georgia Tech
Dr. Vijay Ganesh, School of Computer Science, Georgia Tech
Dr. Tushar Krishna, School of Electrical and Computer Engineering, Georgia Tech
Dr. Ram Ramjee, Partner Research Manager, Microsoft Research
Dr. Srinivas Sridharan, Distinguished Engineer, NVIDIA
Abstract: Serving large AI models efficiently while guaranteeing low and predictable latency is the central systems challenge in deploying modern AI. This thesis addresses this challenge through two complementary thrusts. First, we build inference systems that maximize hardware utilization by exploiting the unique properties of these workloads — resolving latency-throughput tradeoffs, scaling to multi-million token contexts, optimizing across memory hierarchies, and enabling efficient data movement across distributed components. Second, we develop deployment optimization systems that identify optimal configurations by jointly reasoning about model architecture, hardware capabilities, workload characteristics, and user requirements for cost and latency. Together, these contributions achieve multi-fold improvements in serving capacity and latency, with core techniques adopted by major open-source inference frameworks serving millions of GPU-hours weekly.
Groups
Status
- Workflow status: Published
- Created by: Tatianna Richardson
- Created: 03/05/2026
- Modified By: Tatianna Richardson
- Modified: 03/05/2026
Categories
Keywords
User Data
Target Audience