event

PhD Proposal by Hyojong Kim

Primary tabs

Title: Techniques to Mitigate Performance Impact of Off-chip Data Migrations in Modern GPU Computing

 

Hyojong Kim

Ph.D. Student

School of Computer Science

College of Computing

Georgia Institute of Technology

 

Date: Friday, May 17, 2019

Time: 10:00 AM - 12:00 PM (EDT)

Location: Klaus 2100

 

Committee:

Dr. Hyesoon Kim (Advisor, School of Computer Science, Georgia Institute of Technology)

Dr. Ada Gavrilovska (School of Computer Science, Georgia Institute of Technology)

Dr. Moinuddin Qureshi (School of Electrical and Computer Engineering, Georgia Institute of Technology)

 

Abstract:

In response to unprecedented demand for compute and memory, modern graphics processing units (GPUs) allow use of multiple GPUs in a system or use of system memory (i.e., CPU memory) in a user-transparent manner. Compute capability scales out with the use of multiple GPUs, and the use of system memory provides an order of magnitude larger memory capacity to a GPU application. However, both techniques require data to be migrated over the system bus (e.g., PCIe bus in modern systems) on demand during execution. Provided that data migration over the PCIe bus takes much longer than what traditional GPUs are designed for, the efficacy of these techniques in provisioning high performance depends on mitigating the data migration overhead.

 

In this thesis proposal, I propose several ideas to help mitigating the data migration overhead. First, I propose CODA, a mechanism to enable co-location of computation and data for multi-GPU systems. CODA estimates the amount of exclusive data and selectively allocates them in a single GPU in the presence of fine-grained memory interleaving, while distributing shared data across multiple GPUs. It uses an affinity-based thread block scheduling policy to place compute in the same GPU as the data it accesses. This enables efficient use of multi-GPUs by exploiting compute capability provisioned by multi-GPUs, while minimizing unnecessary off-chip data migrations. Next, I propose SCD, a mechanism to realize an efficient unified memory management in modern GPUs. SCD reduces major inefficiencies that arise in the page fault handling mechanism employed in modern GPUs. SCD supports a CPU-like thread block context switching to reduce the number of batch processing and amortize the batch processing overhead. It takes page eviction off the critical path with no hardware changes by overlapping evictions with CPU-to-GPU page migrations. It reduces CPU-to-GPU page migration time by providing a lightweight, on-the-fly compression and decompression.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:05/13/2019
  • Modified By:Tatianna Richardson
  • Modified:05/13/2019

Categories

Keywords