event

PhD Defense by Wenqi Cao

Primary tabs

Ph.D. Defense of Dissertation Announcement

 

Title: Dynamic and Elastic Memory Management in Virtualized Clouds

 

Wenqi Cao

Ph.D. Student

Systems

School of Computer Science

Georgia Institute of Technology

 

Date: March 26 (Tuesday)

Start Time: 9:00am

Location: KACB3402

 

Committee

———————

Dr. Ling Liu (Advisor, School of Computer Science, Georgia Institute of Technology)

Dr. Calton Pu (Co-Advisor, School of Computer Science, Georgia Institute of Technology)

Dr. David Devecsery (School of Computer Science, Georgia Institute of Technology)

Dr. Joy Arulraj (School of Computer Science, Georgia Institute of Technology)

Dr. Gerald Lofstead (Scalable System Software Group, Sandia National Laboratories)

 

Abstract

———————

The memory capacity of computers and edge devices continue to grow: the DRAM capacity for low end computers are at tens or hundreds of GBs and the modern high performance computing (HPC) platforms can support terabytes of RAM for Big data driven HPC and Machine Learning (ML) workloads. Although system virtualization improves resource consolidation, it does not tackle the increasing cost of address translation and the growing size of page tables OS kernel maintains. All virtual machines and processors use pages tables for address translation. On the other hand, Big data and latency-demanding applications are typically deployed in virtualized Clouds using the application deployment models, comprised of virtual machines (VMs), containers, and/or executors/JVMs. These applications enjoy high throughput and low latency if they are served entirely from memory. However, actual estimation and memory allocation are difficult. When these applications cannot fit their working sets in real memory of their VMs/containers/executors, they suffer large performance loss due to excess page faults and thrashing. Even when unused host memory or unused remote memory are present in other VMs or containers and executors, these applications are unable to share those unused host/remote memory. Existing proposals focus on estimating working set size for accurate allocation, and increasing effective capacity of executors, but lack of desired transparency and efficiency. 

 

This dissertation research takes a holistic approach to tackle the above problems from three dimensions. First, we present the design of FastSwap, a highly efficient shared memory paging facility. FastSwap dynamic shared memory management scheme can effectively utilize the shared memory across VMs through host coordination, with three original contributions. (1) FastSwap provides efficient support for multi-granularity compression of swap pages in both shared memory and disk swap devices. (2) FastSwap provides an adaptive scheme to flush the least recently swap-out pages to disk swap partition when shared memory swap partition reaches a pre-specified threshold and close to full. (3) FastSwap provides batch swap-in optimizations. Our extensive experiments using big data analytics applications and benchmarks demonstrate that FastSwap offers up to two orders of magnitude performance improvements over existing memory swapping methods. 

 

Second,  we develop XMemPod for non-intrusive host/remote memory sharing and for improving performance of memory-intensive applications. It leverages the memory capacity of host machines and remote machines on the same cluster to provide on-demand, transparent and non-intrusive sharing of unused memory, effectively removing the performance degradation of big data and ML workloads due to transient or imbalanced memory pressure experienced on a host or in a cluster. We demonstrate the benefit of XMemPod design and the benefits of memory sharing via three optimizations: First, we provide elasticity, multi-granular compressibility and failure isolation on shared memory pages. Second, we implement hybrid swap-out for better utilization of host and remote shared memory. Third but not the last we support proactive swap-in from remote to host, from disk to host, and from host to guest, which improves paging-in operations significantly and opportunistically and shortens the performance recovery time of those applications under memory pressure. XMemPodis deployed on a virtualized RDMA cluster without any modifications to user applications and the OSes. Evaluated with multiple workloads on unmodified Spark, Apache Hadoop, Memcached, Redis and VoltDB, using XMemPod, throughputs of these applications improve by 11x to 612x over conventional OS disk swap facility, and by 1.7x to 14x over the existing representative remote memory paging system. 

 

Third, we propose Xpage, a memory management framework that effectively mitigates fragmentation and makes huge page benefits accessible to applications even in stressful conditions. Hardware manufactures and operating systems have addressed the increasing DRAM capacity with better support for larger page sizes called huge pages. The benefits of huge pages is the obvious performance gain from fewer translations requiring fewer cycles. Unlike Linux and Ingens, which demotes huge pages to base pages once fragmentation occurs, Xpage never splits huge pages. Instead, it tracks allocated/unallocated memory regions and compacts all memory regions in memory allocator level by placing them carefully within huge page boundaries. Xpage is a memory management redesign that brings performance and memory saving to memory intensive applications with dynamic memory behavior. 

 

In this defense exam, I will report the design, implementation and evaluation of XMemPod and Xpage.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/06/2019
  • Modified By:Tatianna Richardson
  • Modified:03/06/2019

Categories

Keywords