event

PhD Defense by Chao Chen

Primary tabs

        Chao Chen

    (Advisor: Dr. Santosh Pande and Dr. Greg Eisenhauer)

 

          will defend a doctoral thesis entitled, 

 

Compiler-Assisted Resilience Framework for Transient Faulty Recovery

 

        On

 

  Monday, November 23 at 11:00 a.m (EST). 

      Location: *No Physical Location*

        BlueJeans: https://bluejeans.com/896561546

 

 

Abstract:

 

Due to system scaling trends toward smaller transistor size, higher circuit density and the use of near-threshold 

voltage (NTV) techniques, transient hardware faults introduced by external noises, e.g., heat fluxes and particle strikes, 

have become a growing concern for current and upcoming extreme-scale high-performance-computing (HPC) systems. 

Applications running on these systems are projected to experience transient errors more frequently than ever 

before, which will either lead them to generate incorrect outputs without warning users or cause them to crash. 

Therefore, efficient resilience techniques against transient hardware faults are required for modern HPC applications. 

 

This dissertation is concerned with the design, implementation, and evaluation of a light-weight resilience framework for 

large-scale scientific applications to mitigate impacts of transient hardware faults. In particular, it consists of 3 novel 

techniques: 1) LADR, a light-weight anomaly-based approach to protect scientific applications against transient-fault-induced 

silent data corruptions (SDCs);  2) CARE, a low-cost compiler-assisted technique to repair the crashed process on-the-fly when 

a crash-causing transient error is detected, such that applications can continue their executions instead of being simply terminated 

and restarted; and 3) IterPro, which targets the problem of recovery from corruptions to the induction variables by exploiting 

side-effects of modern compiler optimization techniques. 

 

To limit the runtime overheads during the normal executions of applications, these approaches exploit properties of 

scientific applications via compiler techniques. Due to the design strategy of these approaches, 

they only incur negligible (<3%) or even zero runtime overheads during the normal execution of applications, 

but still achieve a high-level fault coverage. 

 

 

Committee:

Dr. Santosh Pande (advisor), School of Computer Science, Georgia Institute of Technology

Dr. Greg Eisenhauer (advisor), School of Computer Science, Georgia Institute of Technology

Dr. Ling Liu, School of Computer Science, Georgia Institute of Technology

Dr. Vivek Sarkar, School of Computer Science, Georgia Institute of Technology

Dr. Richard Vuduc, School of Computer Science and Engineering, Georgia Institute of Technology

Dr. Frank Cappello, Mathematics and Computer Science Division, Argonne National Laboratory

 

 

Regards,

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/09/2020
  • Modified By:Tatianna Richardson
  • Modified:11/10/2020

Categories

Keywords