event
PhD Defense by Divya Kiran Kadiyala
Primary tabs
Title: Memory System Optimizations for Parallel and Bandwidth-Intensive Workloads
Date: Monday, November 17, 2025
Time: 10:30 AM – 12:30 PM ET
Location: Hybrid
- In-person: Klaus 1120A conference room
- Teams: https://teams.microsoft.com/l/meetup-join/19%3ameeting_NWFlNGIzZTgtZTQxOC00OWJmLTljMjktMDVhMGQ3MTUwMjBj%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%220c913e5c-d645-4b33-ace1-b55792935f30%22%7d
Divya Kiran Kadiyala
School of Electrical and Computer Engineering
Georgia Institute of Technology
Committee:
Dr. Alexandros Daglis (Advisor) – School of Computer Science, Georgia Institute of Technology
Dr. Moinuddin K. Qureshi – School of Computer Science, Georgia Institute of Technology
Dr. Tushar Krishna – School of Electrical and Computer Engineering, Georgia Institute of Technology
Dr. Yingyan (Celine) Lin – School of Computer Science, Georgia Institute of Technology
Dr. Puneet Sharma – Networking and Distributed Systems Lab (NDSL), Hewlett Packard Enterprise
Abstract:
Modern datacenters form the foundation of today's digital infrastructure, supporting large-scale web services, enterprise cloud platforms, and emerging generative AI applications that process and exchange massive volumes of data. As processors continue to scale in core count and computational throughput, the disparity between compute capability and memory performance has become a critical bottleneck—manifesting as limitations in memory capacity, bandwidth, and latency. This growing imbalance, compounded by the slowdown of Moore's Law and increasing system complexity, poses a fundamental challenge to sustaining performance for data-intensive and highly parallel workloads. Addressing these challenges requires rethinking the memory hierarchy through innovations that jointly consider workload characteristics, hardware capabilities, and system-level interactions.
This dissertation presents a holistic, cross-layer co-design approach to overcome the memory wall by optimizing the memory hierarchy across chip, server, and cluster levels. At the chip level, HinTM enhances effective on-chip capacity of Hardware Transactional Memory (HTM) and transactional concurrency through hardware-software co-design approach. At the server level, SURGE dynamically harvests idle I/O bandwidth over CXL links to boost effective memory bandwidth and reduce access latency under bandwidth-bound conditions. At the cluster level, COMET provides a composable modeling and co-optimization framework that enables rapid design space exploration for model, algortihm, and hardware resources for distributed AI training. Together, these contributions advance the design of efficient and workload-aware memory systems that sustain high performance across parallel and bandwidth-intensive computing environments.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:11/17/2025
- Modified By:Tatianna Richardson
- Modified:11/17/2025
Categories
Keywords
Target Audience