event

PhD Defense by Pradeep Fernando

Primary tabs

Title: Adding Persistence to Main Memory Programming

 

Pradeep Fernando

School of Computer Science

College of Computing

Georgia Institute of Technology

https://www.cc.gatech.edu/grads/p/pfernand

 

Date: Monday,  June 29th, 2020

Time: 11:00 AM - 1:00 PM (EST)

Location: https://bluejeans.com/7183073013 (remote)

 

Committee:

Dr. Ada Gavrilovska (Advisor, School of Computer Science, Georgia Tech)

Dr. Joy Arulraj (School of Computer Science, Georgia Tech)

Dr. Tushar Krishna (School of Electrical Engineering, Georgia Tech)

Dr. Umakishore Ramachandran (School of Computer Science, Georgia Tech)

Dr. Amitabha Roy (Google)

 

Abstract:

Unlocking the true potential of the new persistent memories (PMEMs) requires eliminating traditional persistent I/O abstractions altogether, by introducing persistent semantics directly into main memory programming. Such a programming model elevates failure atomicity to a first-class application property in addition to in-memory data layout, concurrency-control, and fault tolerance, and therefore requires redesign of programming abstractions for both program correctness and maximum performance gains. To address these challenges, this thesis proposes a set of system software designs that integrate persistence with main memory programming, and makes the following contributions.

 

First, this thesis proposes a PMEM-aware I/O runtime, NVStream, that supports fast durable streaming I/O. NVStream uses a memory-based I/O interface that integrates with existing I/O data movement operations of an application to accelerate persistent data writes. NVStream carefully designs its persistent data storage layout and crash-consistent semantics to match both application and PMEM characteristics. Specifically, we leverage the streaming nature of I/O in HPC workflows, to benefit from using a log-structured PMEM storage engine design, that uses relaxed write orderings and append-only failure-atomic semantics to form strongly consistent application checkpoints. Furthermore, we identify that optimizing the I/O software stack exposes the PMEM bandwidth limitations as a bottleneck during parallel HPC I/O writes, and propose a novel  data movement design -- PHX. PHX uses alternative network data movement paths available in datacenters to ease up the bandwidth pressure on the PMEM memory interconnects, all while maintaining the correctness of the persistent data.

 

Next, the thesis explores the challenges and opportunities of using PMEM for true main memory persistent programming -- a single data domain for both runtime and persistent application state. Such a programming model includes maintaining ACID properties during each and every update to application’s persistent structures. ACID-qualified persistent programming for multi-threaded applications is hard, as the programmer has to reason about  both crash-consistency and synchronization -- crash-sync -- semantics for programming correctness. The thesis contributes new understanding of  the correctness requirements for mixing different crash-consistent and synchronization protocols,  characterizes the performance of different crash-sync realizations for different  applications  and hardware architectures, and draws actionable insights for future designs of PMEM systems.

 

Finally, the application state stored on node-local persistent memory is still vulnerable to catastrophic node failures. The thesis proposes a replicated persistent memory runtime,  Blizzard, that supports truly fault tolerant, concurrent and persistent data-structure programming. Blizzard carefully integrates userspace networking with byte addressable PMEM for a fast, persistent memory replication runtime. The design  also incorporates a replication-aware crash-sync protocol that supports consistent and concurrent updates on persistent data-structures. Blizzard offers applications the flexibility to use the data structures that best match their functional requirements, while offering better performance, and providing crucial reliability guarantees lacking from existing persistent memory runtimes.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:06/19/2020
  • Modified By:Tatianna Richardson
  • Modified:06/19/2020

Categories

Keywords