event

Phd Proposal by Steffan Maass

Primary tabs

Title: Systems Abstractions for Big Data Processing on a Single Machine

 

Steffen Maass

School of Computer Science

College of Computing

Georgia Institute of Technology

 

Date: Thursday, November 1st, 2018

Time: 4:30pm EDT

Location: KACB 3126

 

Committee:

------------

Dr. Taesoo Kim (Advisor, School of Computer Science, Georgia Tech) Dr. Ada Gavrilovska (School of Computer Science, Georgia Tech) Dr. Umakishore Ramachandran (School of Computer Science, Georgia Tech) Dr. Tushar Krishna (School of Electrical Engineering, Georgia Tech) Dr. Willy Zwaenepoel (Faculty of Engineering and Information Technologies, The University of Sydney)

 

 

Abstract:

-----------

Large-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks.

However, while it may be possible to apply the tools used at this scale to smaller, more common problems as well, this thesis presents approaches to large-scale data processing on only a single machine.

This approach has obvious cost benefits and lowers the barrier of entrance to large-scale data processing.

This thesis approaches this problem from both an operating systems perspective as well as a redesign of applications to enable trillion-scale graph processing on a single machine.

 

We first present an asynchronous scheme for clearing the processors' translation lookaside buffers (TLBs) in response to the high overhead of the current, synchronous process known as a TLB shootdown.

This process is critical for system services such as freeing memory, NUMA memory migration, and page swapping in emerging, disaggregated data centers.

The key idea of this scheme, Latr, is a lazy mechanism to remove entries from the cores' TLBs while ensuring correctness by lazily releasing virtual memory only after Latr's lazy shootdown mechanism finishes.

This scheme removes the current overhead of costly inter-processor interrupts.

We show that this mechanism has impacts on many applications from webservers and key-value stores to graph processing.

 

Second, this thesis presents a new out-of-core graph processing engine, called Mosaic, for executing graph algorithms on trillion-scale datasets on a single machine.

Mosaic makes use of many-core processors and PCIe-SSDs coupled with a novel graph encoding scheme to allow processing of graphs of up to one trillion edges on a single machine.

Mosaic also employs a locality-preserving curve to allow for high compression and high locality when storing graphs and executing algorithms.

 

Third, this thesis proposes a new engine for processing evolving graphs, Kaleidoscope, based on insights about achieving high compression and locality while reducing common overheads for keeping the graph well organized.

This is an important enabling step for emerging workloads when processing graphs that change over time.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:10/24/2018
  • Modified By:Tatianna Richardson
  • Modified:10/24/2018

Categories

Keywords