PhD Defense by Divya Kiran Kadiyala

Title: Memory System Optimizations for Parallel and Bandwidth-Intensive Workloads

Date: Monday, November 17, 2025

Time: 10:30 AM – 12:30 PM ET

Location: Hybrid

In-person: Klaus 1120A conference room
Teams: https://teams.microsoft.com/l/meetup-join/19%3ameeting_NWFlNGIzZTgtZTQxOC00OWJmLTljMjktMDVhMGQ3MTUwMjBj%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%220c913e5c-d645-4b33-ace1-b55792935f30%22%7d

Divya Kiran Kadiyala

School of Electrical and Computer Engineering

Georgia Institute of Technology

Committee:

Dr. Alexandros Daglis (Advisor) – School of Computer Science, Georgia Institute of Technology

Dr. Moinuddin K. Qureshi – School of Computer Science, Georgia Institute of Technology

Dr. Tushar Krishna – School of Electrical and Computer Engineering, Georgia Institute of Technology

Dr. Yingyan (Celine) Lin – School of Computer Science, Georgia Institute of Technology

Dr. Puneet Sharma – Networking and Distributed Systems Lab (NDSL), Hewlett Packard Enterprise

Abstract:
Modern datacenters form the foundation of today's digital infrastructure, supporting large-scale web services, enterprise cloud platforms, and emerging generative AI applications that process and exchange massive volumes of data. As processors continue to scale in core count and computational throughput, the disparity between compute capability and memory performance has become a critical bottleneck—manifesting as limitations in memory capacity, bandwidth, and latency. This growing imbalance, compounded by the slowdown of Moore's Law and increasing system complexity, poses a fundamental challenge to sustaining performance for data-intensive and highly parallel workloads. Addressing these challenges requires rethinking the memory hierarchy through innovations that jointly consider workload characteristics, hardware capabilities, and system-level interactions.

This dissertation presents a holistic, cross-layer co-design approach to overcome the memory wall by optimizing the memory hierarchy across chip, server, and cluster levels. At the chip level, HinTM enhances effective on-chip capacity of Hardware Transactional Memory (HTM) and transactional concurrency through hardware-software co-design approach. At the server level, SURGE dynamically harvests idle I/O bandwidth over CXL links to boost effective memory bandwidth and reduce access latency under bandwidth-bound conditions. At the cluster level, COMET provides a composable modeling and co-optimization framework that enables rapid design space exploration for model, algortihm, and hardware resources for distributed AI training. Together, these contributions advance the design of efficient and workload-aware memory systems that sustain high performance across parallel and bandwidth-intensive computing environments.

Media

No media selected

Summary

Memory System Optimizations for Parallel and Bandwidth-Intensive Workloads

Details

Monday

Nov 17 2025

10:30am - 12:30pm

Location: Klaus 1120A conference room

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:11/17/2025
Modified By:Tatianna Richardson
Modified:11/17/2025

Mercury (Hg)