PhD DEfense by Alexander Merritt
Title: Efficient Programming of Massive-Memory Machines
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Friday, July 28, 2017
Time: 7-9AM PT / 10AM-12PM ET
Location: KACB 3402
Dr. Ada Gavrilovska (Advisor, Committee Chair, School of Computer Science) Dr. Taesoo Kim (School of Computer Science) Dr. Kishore Ramachandran (School of Computer Science) Dr. Moinuddin Qureshi (School of Electrical and Computer Engineering) Dr. Dejan Milojicic (Hewlett Packard Labs, HPE) Dr. Karsten Schwan (Co-advisor, School of Computer Science)
New and emerging memory technologies combined with enormous growths in data collection and mining within industry are giving rise to servers with massive pools of main memory — terabytes of memory, disaggregated bandwidth across tens of sockets, and hundreds of cores. But, these systems are proving difficult to program efficiently, posing scalability challenges for all layers in the software stack, specifically in managing in-memory data sets. Larger and longer-lived data sets managed by key-value stores require minimizing over-committments of memory, but current designs trade off performance scalability and memory bloat. Furthermore, opaque operating system abstractions like virtual memory and ill-matched, non-portable interfaces used to manipulate them make the expression of semantic relationships between applications and their data difficult: sharing in-memory data sets requires careful control over internal address mappings, but mmap, ASLR, and friends remove this control.
To explore and address these challenges, this dissertation is composed of two pieces:
(1) We introduce and compare a new design for key-value stores, a multi-head log-structured allocator whose design makes explicit use of a machine’s configuration to support linear scalability of common read- and write-heavy access patterns. Our implementation of this design, called Nibble, is written in 4k lines of Rust.
(2) Going beyond key-value stores, the second part of this dissertation introduces new general support within the operating system enabling applications to more explicitly manage and share pointer-based in-memory data: we introduce explicit control over address space allocation and layout by promoting address spaces as an explicit abstraction. Processes may associate with multiple address spaces, and threads may arbitrarily switch between them to access infinite data set sizes without encountering typical bottlenecks from legacy mmap interfaces. Our implementation of this design is in DragonFly BSD.