PhD Defense by Hyojong Kim

Event Details
  • Date/Time:
    • Monday January 6, 2020
      12:00 pm - 2:00 pm
  • Location: Klaus 2100
  • Phone:
  • URL:
  • Email:
  • Fee(s):
  • Extras:
No contact information submitted.

Summary Sentence: Techniques to Mitigate Performance Impact of Off-chip Data Migrations in Modern GPU Computing

Full Summary: No summary paragraph submitted.

Title: Techniques to Mitigate Performance Impact of Off-chip Data Migrations in Modern GPU Computing


Hyojong Kim

School of Computer Science

College of Computing

Georgia Institute of Technology


Date: Monday, Jan 6, 2020

Time: 12:00 PM - 2:00 PM (EST)

Location: Klaus 2100



Dr. Hyesoon Kim (Advisor, School of Computer Science, Georgia Institute of Technology)

Dr. Ada Gavrilovska (School of Computer Science, Georgia Institute of Technology)

Dr. Milos Prvulovic (School of Computer Science, Georgia Institute of Technology)

Dr. Moinuddin Qureshi (School of Electrical and Computer Engineering, Georgia Institute of Technology)

Dr. Vivek Sarkar (School of Computer Science, Georgia Institute of Technology)



Graphics Processing Units (GPUs) have been used successfully for accelerating a wide variety of applications over the last decade. In response to growing compute and memory capacity requirements, modern systems are equipped to distribute the work over multiple GPUs and pool the memory from the host (i.e., system memory) and other GPUs transparently. Compute capacity scales out with multiple GPUs, and the memory capacity afforded by the host is an order of magnitude larger than the GPUs’ device memory. However, both these approaches require data to be migrated over the system interconnect (e.g., PCI-e) during program execution. Since migrating data over the system interconnect takes much longer than a GPU’s internal memory hierarchy, the efficacy of these approaches in achieving high performance is strongly dependent on the data migration overhead. This dissertation proposes several techniques that help mitigate this data migration overhead.


In a system with multiple GPUs, where there is a large discrepancy in access times between local and remote memory accesses, it is crucial to co-locate compute and data to achieve high performance. This thesis discusses how to enable co-location of compute and data in such systems. The proposed mechanism estimates the amount of exclusive data and selectively allocates it in a single GPU while distributing the shared data across multiple GPUs. For this selective coarse-grained allocation, it uses a dual address mode with lightweight changes to virtual to physical page mappings. To place compute in the same GPU as the data it accesses, it uses an affinity-based thread block scheduling policy. This enables efficient use of multiple GPUs while minimizing unnecessary off-chip data migrations.


Support for unified virtual memory and demand paging in modern GPUs provides a coherent view of a single virtual address space between CPUs and GPUs. This allows GPUs to access pages that reside in CPU memory as if they were local to the GPU. This enables GPU applications that are otherwise impossible to run due to memory capacity constraints to run seamlessly. This thesis discusses how to alleviate major inefficiencies that arise in the page fault handling mechanism employed in contemporary GPUs. The proposed mechanism supports a CPU-like thread block context switching to reduce the number of batches (i.e., a group of page faults handled together) and amortize the batch processing overhead. To take page eviction off the critical path, it modifies the runtime software to overlap page evictions with CPU-to-GPU page migrations without requiring any hardware changes.

Additional Information

In Campus Calendar

Graduate Studies

Invited Audience
Faculty/Staff, Public, Graduate students, Undergraduate students
Phd Defense
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Jan 2, 2020 - 1:53pm
  • Last Updated: Jan 2, 2020 - 1:53pm