event
PhD Proposal by Hao Kang
Primary tabs
Title: Configurable Execution Systems for Dynamic Agentic AI Workloads
Date: Tuesday, May 5th, 2026
Time:1:30-3:30 pm EST
Location: Remote
Zoom:https://gatech.zoom.us/j/93623089632
Hao Kang
Ph.D. Student in Computer Science
School of Computer Science
Georgia Institute of Technology
https://haokang-timmy.github.io/
Committee members
Dr. Tushar Krishna (advisor) - School of Computer Science, Georgia Institute of Technology
Dr. Alexey Tumanov - School of Computer Science, Georgia Institute of Technology
Dr. Tuo Zhao - School of Computational Science and Engineer, Georgia Institute of Technology
Dr. Song Han - EECS, Massachusetts Institute of Technology
Abstract
Large Language Model (LLM) systems are increasingly deployed as agents that interact with external tools, simulators, markets, games, and software environments. Unlike conventional LLM serving, where workloads are often modeled as independent requests optimized for static throughput or latency targets, agentic execution exposes dynamic environment states, heterogeneous resource lifecycles, and task-dependent latency--quality trade-offs. In this dissertation, we study how to make AI systems configurable for dynamic agentic execution. The central goal is to design system abstractions and mechanisms that expose the right configuration space across precision, scheduling, memory management, and tool-resource orchestration, so that deployment can be specialized to the characteristics of the task, model, and environment rather than relying on one-size-fits-all serving policies.
In the first direction, we study latency-sensitive agent decision tasks, where an agent's final reward is jointly determined by response quality and inference latency. We introduce two real-time evaluation environments: HFTBench, a high-frequency trading simulator, and StreetFighter, a competitive gaming environment. These benchmarks show that different environments prefer different operating points along the latency--quality trade-off: some tasks reward faster but less accurate actions, while others require both low latency and high decision quality. To address this, we propose FPX, a configurable mixed-precision inference framework that exposes fine-grained latency-quality control through model-size selection and layer-wise FP8/FP4 precision assignment. FPX allows the system to choose task-appropriate inference configurations and improves downstream reward over fixed-precision baselines. This direction demonstrates that agent efficiency should be measured not only by standalone model accuracy or raw throughput, but also by how system configurations affect final reward in dynamic environments.
The second direction studies long-horizon multi-turn agentic execution, where agents repeatedly alternate between LLM reasoning and external tool execution. Existing systems loosely compose an inference engine with a general-purpose tool orchestrator, causing request-level scheduling, KV-cache thrashing, cross-node memory imbalance, and unmanaged tool-resource lifecycles. We propose ThunderAgent, a program-aware agentic inference system that abstracts each workflow as an LLM Program spanning model invocations, KV-cache states, and tool environments. Built on this abstraction, ThunderAgent introduces program-aware scheduling and tool-resource management to configure execution across GPU memory, data-parallel nodes, and external environments. This enables high-throughput serving and reinforcement-learning rollouts while reducing recomputation and resource leakage.
Together, these directions argue that future AI infrastructure should not treat agentic workloads as static requests. Instead, systems should expose configurable execution mechanisms that can be specialized to the dynamic requirements of models, tasks, and environments, enabling efficient and robust execution for emerging agentic AI applications.
Groups
Status
- Workflow status: Published
- Created by: Tatianna Richardson
- Created: 04/29/2026
- Modified By: Tatianna Richardson
- Modified: 04/29/2026
Categories
Keywords
User Data
Target Audience