PhD Defense by Osman Emre Dai

Primary tabs

Name: Osman Emre Dai 

Title: Fundamental Limits and Algorithms for Database and Graph Alignment


    Dr. Negar Kiyavash, Business Analytics (Ecole Polytechnique Federale de Lausanne)

    Dr. Mohit Singht, School of Indsutrial and Systerms Engineering (Georgia Tech)

Committee Members:

    Dr. Daniel Cullina, School of Electrical Engineering and Computer Science (Pennsylvania State University)

    Dr. Cheng Mao, School of Mathematics (Georgia Tech)

    Dr. Ashwin Pananjady, Electrical Engineering and Computer Science (Georgia Tech)


    Dr. Cheng Mao, School of Mathematics (Georgia Tech),

Time: Tuesday, December 5th at 1:00 PM

Location: ISyE Groseclose 226

Meeting Link (Zoom): https://gatech.zoom.us/j/91577958731


Data alignment refers to a class of problems where given two sets of anonymized  data pertaining  to overlapping sets of users, the goal is  to identify the correspondences between the two sets. If the data of a user is contained in both sets, the correlation between the two data points associated with the user might make it possible to determine that both belong to the same user and hence link the data points. Alignment problems are of practical interest in applications such as privacy and data junction. Data alignment can be used to de-anonymize data, therefore,  studying the feasibility of alignment allows for a more reliable understanding of the limitations of anonymization schemes put in place to protect against privacy breaches. Additionally, data alignment can aid in finding the correspondence between data from different sources, e.g. different sensors. The data fusion performed through data alignment in turn can help with variety of inference problems that arise in  scientific and engineering applications.


This thesis considers two types of data alignment problems: database and graph alignment. Database alignment refers to the setting where each feature (i.e. data points) in a data set is associated with a single user. Graph alignment refers to the setting where data points in each data set are associated with pairs of users. For both problems, we are particularly interested in the asymptotic case where n, the number of users with data in both sets, goes to infinity. Nevertheless our analyses often yield results applicable to the finite n case. To develop a preliminary understanding of the database alignment problem, we first study the closely related problem of planted matching with Gaussian weights of unit variance, and derive tight achievability bounds that match our converse bounds: Specifically we identify different inequalities between log n and the signal strength (which corresponds to the square of the difference between the mean weights of planted and non-planted edges) that guarantee upper bounds on the log of the expected number of errors. Then, we study the database alignment problem with Gaussian features in the low per-feature correlation setting where the number of dimensions of each feature scales as ω(log n):  We derive inequalities between log n and signal strength (which, for database alignment, corresponds to the mutual information between correlated features) that guarantee error bounds matching those of the planted matching setting, supporting the claimed connection between the two problems. Then, relaxing the restriction on the number of dimensions of features, we derive conditions on signal strength and dimensionality that guarantee smaller upper bounds on the log of the expected number of errors. The stronger results in the O(log n)-dimensional-feature setting for Gaussian databases show how planted matching, while useful, is not a perfect substitute to understand the dynamics of the more complex problem of database alignment. For graph alignment, we focus on the correlated Erdős–Rényi graph model where the data point (i.e. edge) associated with each pair of users in a graph is a Bernoulli random variable that is correlated with the data point associated with the same pair in the other graph. We study a canonical labeling algorithm for alignment and identify conditions on the density of the graphs and correlation between edges across graphs that guarantees the recovery of the true alignment with high probability.


  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/27/2023
  • Modified By:Tatianna Richardson
  • Modified:11/27/2023



Target Audience