event

Phd Defense by Steffen Maass

Primary tabs

Title: Systems Abstractions for Big Data Processing on a Single Machine

 

Steffen Maass

School of Computer Science

College of Computing

Georgia Institute of Technology

 

Date: Wednesday, April 3, 2019

Time: Noon EDT

Location: KACB 3100

 

Committee:

------------

Dr. Taesoo Kim (Advisor, School of Computer Science, Georgia Tech) Dr. Ada Gavrilovska (School of Computer Science, Georgia Tech) Dr. Umakishore Ramachandran (School of Computer Science, Georgia Tech) Dr. Tushar Krishna (School of Electrical Engineering, Georgia Tech) Dr. Willy Zwaenepoel (Faculty of Engineering and Information Technologies, The University of Sydney)

 

Abstract:

-----------

 

Large-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks.

However, while it may be possible to apply the tools used at this scale to smaller, more common problems as well, this dissertation presents approaches to large-scale data processing on only a single machine.

This approach has obvious cost benefits and lowers the barrier of entrance to large-scale data processing.

This dissertation approaches this problem by redesigning applications to enable trillion-scale graph processing on a single machine, enable the processing of evolving, billion-scale graphs, and presenting an operating-systems level optimization.

 

First, this dissertation presents a new out-of-core graph processing engine, called Mosaic, for executing graph algorithms on trillion-scale datasets on a single machine.

Mosaic makes use of many-core processors and PCIe-SSDs coupled with a novel graph encoding scheme to allow processing of graphs of up to one trillion edges on a single machine.

Mosaic also employs a locality-preserving curve to allow for high compression and high locality when storing graphs and executing algorithms.

Second, this dissertation presents Cytom, a new engine for processing evolving graphs based on insights about achieving high compression and locality while improving load-balancing when processing a graph that changes rapidly.

Cytom also introduces a novel programming model that takes advantage of its subgraph-centric approach coupled with the setting of evolving graphs.

This is an important enabling step for emerging workloads when processing graphs that change over time.

Finally, we present an asynchronous scheme for clearing the processors' translation lookaside buffers (TLBs) in response to the high overhead of the current, synchronous process known as a TLB shootdown.

This process is critical for system services such as freeing memory, NUMA memory migration, and page swapping in emerging, disaggregated data centers; these services are often used when processing large amounts of data.

The key idea of this scheme, Latr, is a lazy mechanism to remove entries from the cores' TLBs while ensuring correctness by lazily releasing virtual memory only after Latr's lazy shootdown mechanism finishes.

This scheme removes the current overhead of costly inter-processor interrupts.

We show that this mechanism has impacts on many applications ranging from webservers which might be used as caching frontends for big data processing to key-value stores and graph processing.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/25/2019
  • Modified By:Tatianna Richardson
  • Modified:03/25/2019

Categories

Keywords