PhD Defense by M. Emre Gursoy
Title: Privacy-Preserving Data Collection and Sharing in Modern Mobile Internet Systems
M. Emre Gursoy
School of Computer Science, Georgia Institute of Technology
Date: May 7th, 2020 (Thursday)
Time: 11:00 AM - 1:00 PM
** Note: This defense is remote-only due to the institute's guidelines on COVID-19 **
Dr. Ling Liu (advisor) - School of Computer Science, Georgia Institute of Technology
Dr. Joy Arulraj - School of Computer Science, Georgia Institute of Technology
Dr. Margaret Loper - Georgia Tech Research Institute
Dr. Calton Pu - School of Computer Science, Georgia Institute of Technology
Dr. Christin Seifert - University of Twente, The Netherlands
With the ubiquity and widespread use of mobile devices such as laptops, smartphones, smartwatches, and IoT devices, large volumes of user data are generated and recorded. While there is great value in collecting, analyzing and sharing this data for improving products and services, data privacy poses a major concern.
This dissertation research addresses the problem of privacy-preserving data collection and sharing in the context of both mobile trajectory data and mobile Internet access data. The first contribution of this dissertation research is the design and development of a system for utility-aware synthesis of differentially private and attack-resilient location traces, called AdaTrace. Given a set of real location traces, AdaTrace executes a four-phase process consisting of feature extraction, synopsis construction, noise injection, and generation of synthetic location traces. Compared to representative prior approaches, the location traces generated by AdaTrace offer up to 3-fold improvement in utility, measured using a variety of utility metrics and datasets, while preserving both differential privacy and attack-resilience.
The second contribution of this dissertation research is the design and development of locally private protocols for privacy-sensitive collection of mobile and Web user data. Motivated by the excessive utility loss of existing Local Differential Privacy (LDP) protocols under small user populations, this dissertation introduces the notion of Condensed Local Differential Privacy (CLDP) and a suite of protocols satisfying CLDP to enable the collection of various types of user data, ranging from ordinal data types in finite metric spaces (malware infection statistics), to non-ordinal items (OS versions and transaction categories), and to sequences of ordinal or non-ordinal items. Using cybersecurity data and case studies from Symantec, a major cybersecurity vendor, we show that proposed CLDP protocols are practical for key tasks including malware outbreak detection, OS vulnerability analysis, and inspecting suspicious activities on infected machines.
The third contribution of this dissertation research is the development of a framework and a prototype system for evaluating privacy-utility tradeoffs of different LDP protocols, called LDPLens. LDPLens introduces metrics to evaluate protocol tradeoffs based on factors such as the utility metric, the data collection scenario, and the user-specified adversary metric. We develop a common Bayesian adversary model to analyze LDP protocols, and we formally and experimentally analyze adversary Success Rate (SR) under each protocol. Motivated by the findings that numerous factors impact the SR and utility behaviors of LDP protocols, we develop LDPLens to provide effective recommendations for finding the most suitable protocol in a given setting. Our three case studies with real-world datasets demonstrate that using the protocol recommended by LDPLens can offer substantial reduction in utility loss or in adversary SR, compared to using a randomly chosen protocol.
In this dissertation defense, I will give an overview of my PhD dissertation research and focus on the motivation, design, and development of LDPLens.