event

PhD Defense by Divya Kiran Kadiyala

Primary tabs

Title:  Memory System Optimizations for Parallel and Bandwidth-Intensive Workloads

 

Date: Monday, November 17, 2025

Time: 10:30 AM – 12:30 PM ET

Location: Hybrid

 

 

 

Divya Kiran Kadiyala

School of Electrical and Computer Engineering

Georgia Institute of Technology

 

Committee:

Dr. Alexandros Daglis (Advisor) – School of Computer Science, Georgia Institute of Technology

Dr. Moinuddin K. Qureshi  – School of Computer Science, Georgia Institute of Technology

Dr. Tushar Krishna  – School of Electrical and Computer Engineering, Georgia Institute of Technology

Dr. Yingyan (Celine) Lin  – School of Computer Science, Georgia Institute of Technology

Dr. Puneet Sharma  – Networking and Distributed Systems Lab (NDSL), Hewlett Packard Enterprise

 

Abstract:
Modern datacenters form the foundation of today's digital infrastructure, supporting large-scale web services, enterprise cloud platforms, and emerging generative AI applications that process and exchange massive volumes of data. As processors continue to scale in core count and computational throughput, the disparity between compute capability and memory performance has become a critical bottleneck—manifesting  as limitations in memory capacity, bandwidth, and latency. This growing imbalance, compounded by the slowdown of Moore's Law and increasing system complexity, poses a fundamental challenge to sustaining performance for data-intensive and highly parallel workloads. Addressing these challenges requires rethinking the memory hierarchy through innovations that jointly consider workload characteristics, hardware capabilities, and system-level interactions.

This dissertation presents a holistic, cross-layer co-design approach to overcome the memory wall by optimizing the memory hierarchy across chip, server, and cluster levels. At the chip level, HinTM enhances effective on-chip capacity of Hardware Transactional Memory (HTM) and transactional concurrency through hardware-software co-design approach. At the server level, SURGE dynamically harvests idle I/O bandwidth over CXL links to boost effective memory bandwidth and reduce access latency under bandwidth-bound conditions. At the cluster level, COMET provides a composable modeling and co-optimization framework that enables rapid design space exploration for model, algortihm, and hardware resources for distributed AI training. Together, these contributions advance the design of efficient and workload-aware memory systems that sustain high performance across parallel and bandwidth-intensive computing environments.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/17/2025
  • Modified By:Tatianna Richardson
  • Modified:11/17/2025

Categories

Keywords

Target Audience