event

PhD Defense by Jingfan Meng

Primary tabs

Title: Efficient Locality Sensitive Hashing Approaches, Primitives, and Applications


Jingfan Meng
Ph.D. Candidate

School of Computer Science, Colleague of Computing

Georgia Tech

 

Date: Wednesday, July 16th, 2025 

Time: 1:00 PM -- 3:00 PM EDT 

Location: Klaus 3402

Microsoft Teams MeetingLink

 

Committee:

Dr. Jun (Jim) Xu (Ph.D. advisor), School of Computer Science, Georgia Tech

Dr. Kexin Rong, School of Computer Science, Georgia Tech

Dr. Joy Arulraj, School of Computer Science, Georgia Tech

Dr. Santosh Vempala, School of Computer Science, Georgia Tech

Dr. Mitsunori Ogihara, Department of Computer Science, University of Miami

 

 

Abstract:

Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. It is an intriguing question how to build index data structures in support for efficient ANNS under various useful distance metrics. Locality sensitive hashing (LSH) is a longstanding and reputable solution approach to ANNS that offers a small index size and reduced costs for index construction and data changes. With numerous LSH approaches in the literature, however, most research has been limited to the collision probability of LSH functions, and many important aspects, such as the fundamental time complexities of evaluating these randomized LSH functions, have been overlooked for long.

 

My research spans both the design of new LSH approaches for ANNS under hard-to-query distance metrics and the design of new algorithms that speed up the evaluation of these LSH functions. In my dissertation, I present novel LSH-based ANNS solutions to two such distance metrics -- point-to-subspace metric in L1 (P2SL1), and ANNS-ALT (after linear transform) -- both of which are useful but currently lack efficient solutions. I also present two algorithmic techniques -- efficient range summation (ERS) and fast Gaussian orthogonal ensemble quadratic form (FGoeQF) -- that help build up LSH primitives that are faster than existing versions by orders of magnitudes. Finally, I present candidate-based density estimation (CanDE), which is an LSH-based application for getting important data analytics in the neighborhood of the query, and CommonSense, a communication-efficient protocol for computing the set intersection based on compressed sensing (a randomized algorithm deeply related to LSH).

 

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:07/07/2025
  • Modified By:Tatianna Richardson
  • Modified:07/07/2025

Categories

Keywords

Target Audience