event

Ph.D. Proposal by Kisung Lee

Primary tabs

Title: Scalable Big Data Systems: Architectures and Optimizations

Kisung Lee
School of Computer Science
College of Computing
Georgia Institute of Technology

Date: Monday, October 20, 2014
Time: 9:30 AM - 11:30 AM EDT
Location: KACB 3402

Committee:
Dr. Ling Liu (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Ed Omiecinski (School of Computer Science, Georgia Institute of Technology)
Dr. Calton Pu (School of Computer Science, Georgia Institute of Technology)
Dr. Karsten Schwan (School of Computer Science, Georgia Institute of Technology)
Dr. Lakshmish Ramaswamy (Department of Computer Science, University of Georgia)

Abstract:
Big data analytics has become not just a hot buzzword but also a strategic Information Technology direction for many enterprises and government organizations. This dissertation research is dedicated to the novel architectural design and optimization techniques for building big data systems that can offer elastic scalability. We have made three novel contributions for addressing the technical challenges of big data processing, centered on both graph datasets and mobile/spatial datasets. First, we develop a suite of graph partitioning algorithms that can run much faster than existing data distribution methods and inherently scale to the growth of big data. The main idea of our approach is to partition a big graph by preserving the core computation data structure as much as possible to maximize the intra-server computation and minimize the inter-server communication. Second, we have developed a distributed framework for iterative graph computations by maximizing the access locality and minimizing distributed messaging cost. Our initial experimental evaluation shows that our approach can significantly outperform Apache Hama on big graphs with a large number of edges. In addition, we have developed optimization techniques for scaling mobile data processing along with three orthogonal dimensions: (i) scalable processing of a large number of spatial alarms for mobile users traveling on road networks, (ii) scalable location tagging techniques for improving the quality of Twitter data analytics and prediction accuracy, and (iii) a lightweight spatial indexing technique for enhancing the search performance of big spatial data. In this dissertation proposal exam, I will briefly highlight these technical contributions and focus on presenting our semantic hashing-based graph partitioning techniques, including system architecture, optimizations and experimental evaluation.

Status

  • Workflow Status:Published
  • Created By:Danielle Ramirez
  • Created:10/07/2014
  • Modified By:Fletcher Moore
  • Modified:10/07/2016

Categories

Target Audience