PhD Defense by Hyojong Kim

Title: Techniques to Mitigate Performance Impact of Off-chip Data Migrations in Modern GPU Computing

Hyojong Kim

School of Computer Science

College of Computing

Georgia Institute of Technology

Date: Monday, Jan 6, 2020

Time: 12:00 PM - 2:00 PM (EST)

Location: Klaus 2100

Committee:

Dr. Hyesoon Kim (Advisor, School of Computer Science, Georgia Institute of Technology)

Dr. Ada Gavrilovska (School of Computer Science, Georgia Institute of Technology)

Dr. Milos Prvulovic (School of Computer Science, Georgia Institute of Technology)

Dr. Moinuddin Qureshi (School of Electrical and Computer Engineering, Georgia Institute of Technology)

Dr. Vivek Sarkar (School of Computer Science, Georgia Institute of Technology)

Abstract:

Graphics Processing Units (GPUs) have been used successfully for accelerating a wide variety of applications over the last decade. In response to growing compute and memory capacity requirements, modern systems are equipped to distribute the work over multiple GPUs and pool the memory from the host (i.e., system memory) and other GPUs transparently. Compute capacity scales out with multiple GPUs, and the memory capacity afforded by the host is an order of magnitude larger than the GPUs’ device memory. However, both these approaches require data to be migrated over the system interconnect (e.g., PCI-e) during program execution. Since migrating data over the system interconnect takes much longer than a GPU’s internal memory hierarchy, the efficacy of these approaches in achieving high performance is strongly dependent on the data migration overhead. This dissertation proposes several techniques that help mitigate this data migration overhead.

In a system with multiple GPUs, where there is a large discrepancy in access times between local and remote memory accesses, it is crucial to co-locate compute and data to achieve high performance. This thesis discusses how to enable co-location of compute and data in such systems. The proposed mechanism estimates the amount of exclusive data and selectively allocates it in a single GPU while distributing the shared data across multiple GPUs. For this selective coarse-grained allocation, it uses a dual address mode with lightweight changes to virtual to physical page mappings. To place compute in the same GPU as the data it accesses, it uses an affinity-based thread block scheduling policy. This enables efficient use of multiple GPUs while minimizing unnecessary off-chip data migrations.

Support for unified virtual memory and demand paging in modern GPUs provides a coherent view of a single virtual address space between CPUs and GPUs. This allows GPUs to access pages that reside in CPU memory as if they were local to the GPU. This enables GPU applications that are otherwise impossible to run due to memory capacity constraints to run seamlessly. This thesis discusses how to alleviate major inefficiencies that arise in the page fault handling mechanism employed in contemporary GPUs. The proposed mechanism supports a CPU-like thread block context switching to reduce the number of batches (i.e., a group of page faults handled together) and amortize the batch processing overhead. To take page eviction off the critical path, it modifies the runtime software to overlap page evictions with CPU-to-GPU page migrations without requiring any hardware changes.

Media

No media selected

Summary

Details

Monday

Jan 6 2020

12:00pm - 02:00pm

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:01/02/2020
Modified By:Tatianna Richardson
Modified:01/02/2020

Mercury (Hg)