PhD Proposal by Juhyun Bae
Title: Distributed Services with Elastic Container Memory Abstractions for Big Data Clouds
Date: Friday, September 24th, 2021
Time: 4:00-6:00pm (ET)
School of Computer Science
Georgia Institute of Technology
Dr. Ling Liu (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Calton Pu (School of Computer Science, Georgia Institute of Technology)
Dr. Joy Arulraj (School of Computer Science, Georgia Institute of Technology)
Dr. Said Mohamed Tawfik Issa (School of Computer Science, Georgia Institute of Technology)
Dr. Jay Lofstead (Sandia National Laboratories)
Big data powered machine learning has become an integral part of many products and services today. Solely relying on elastic scaling up to deal with resource limits in Cloud and at Edge may not be sufficient, especially when the workload pattern and volume are diverse and hard to predict, let alone the cost of scaling up and the presence of unutilized resources in cross-container and cross-cluster settings. Although container is an essential server technology for multi cloud, hybrid cloud and edge cloud computing, the performance of containers on a host node is affected by the memory usage of other co-hosted containers due to the shared common pool of host memory. Transient load surge can cause serious performance degradation when the container runtime is out of allocated memory or the host memory runs out. This dissertation research addresses these problems by developing distributed services with elastic container memory abstraction, enabling container runtime to utilize free memory on the same host and in remote nodes of a cluster with three unique contributions.
We first design and develop the elastic host memory abstraction to allow cross-container memory sharing based on dynamic and transient memory demands of container runtime, enabling containers to elastically expand or shrink their available memory on demand. This new cross container memory abstraction enables a container to manage the unexpected large working set memory gracefully rather than relying on restarting of the container to increase the memory capacity. This new memory abstraction can be incorporated to container runtime execution transparently without any modification to the host operating system and the application.
Next, we design and develop the elastic host-remote memory abstraction to allow cross-container and cross node memory sharing. This enables containers to flexibly expand its memory demand to remote memory in the cluster in response to the unexpected demand on the working set memory when the host memory is insufficient to accommodate the demand. By efficiently leveraging available remote free memory in the cluster, it incorporates the remote network memory as a part of the runtime memory hierarchy of containers transparently without any impact on existing OS and applications running in the containers on both host and remote nodes. By enabling containers to leverage remote network memory, which is two orders of magnitude faster than disk I/O, our elastic host-remote memory abstraction can provide graceful performance adaptation to the demand of memory intensive applications, alleviating the restarting of containers for expanding its memory allocation or the migration of container execution when its host memory is insufficient to accommodate the additional memory demand.
We implement our elastic remote network memory abstraction on top of RDMA to utilize its one-sided read / write operations. Several RDMA optimizations are introduced to provide efficient communication fabrics for enabling containers to leverage our cross-node memory abstraction and achieve maximum throughput. For example, we provide efficient merging and chaining of requests to reduce the cost of I/O. We develop new event polling mechanism to adapt to diverse workload patterns. To demonstrate the effectiveness of our elastic host-remote container memory abstraction, we implement a transparent network memory storage for container execution (CNetMem) with two novel features. We introduce group based remote memory abstraction to enable our elastic container memory abstraction to be scalable to large cluster size by managing a controlled pool of concurrent connections. Our grouping strategy uses pre-connection and mapping warm up to avoid connecting and mapping remote memory via the performance critical path. We then introduce rank-based node selection algorithm to find the most suitable remote node for mapping remote memory block by considering memory increasing and decreasing trend on the remote node as well as the amount of free memory available, which can help avoiding or minimizing remote eviction. We also utilize a new hybrid batching technique tailored especially for replication or disk-backup to increase fault tolerance with minimal performance impact.
Our ongoing research is focused on providing scalable distributed data services across a large population of edge clients (e.g., end devices) with high churning edge network. We develop a two-tier edge overlay network, with the top tier overlay offering dynamic and hierarchical grouping to support two types of distributed application services: scalable remote file system and federated learning system, and the bottom tier providing efficient and transparent connection management for decentralized management of hierarchical peer to peer computation with high dependability and fault tolerance. In this proposal exam, I will provide detailed overview of CNetMem in terms of design, development, and evaluations with a dozen of key-value system workloads and machine learning workloads.