event

Ph.D. Proposal Oral Exam - Si Li

Primary tabs

Title:  Managing Transient Reliability and Performance in GPU Applications

Committee: 

Dr. Yalamanchili, Advisor     

Dr. Wills, Chair

Dr. Kim

Abstract: The objective of the proposed research is to develop a framework for software-based, low-cost error detection for GPU applications that can adapt to dynamic changes in kernel resilience characteristics as well as environmental reliability factors. The proposed research consists of an adaptive, software reliability enhancement (SRE) framework, a dynamic reliability management (DRM) that leverages SRE framework to control trade offs between performance and reliability, and an SRE technique tailored to the unique properties of GPU execution. By incorporating the variation in reliability requirements, applications can reach the same level of resilience with lower overhead than any one technique.

Status

  • Workflow Status:Published
  • Created By:Daniela Staiculescu
  • Created:04/26/2016
  • Modified By:Fletcher Moore
  • Modified:10/07/2016

Categories

Target Audience