event
PhD Defense by Jingfan Meng
Primary tabs
Title: Efficient Locality Sensitive Hashing Approaches, Primitives, and Applications
Jingfan Meng
Ph.D. Candidate
School of Computer Science, Colleague of Computing
Georgia Tech
Date: Wednesday, July 16th, 2025
Time: 1:00 PM -- 3:00 PM EDT
Location: Klaus 3402
Microsoft Teams Meeting: Link
Committee:
Dr. Jun (Jim) Xu (Ph.D. advisor), School of Computer Science, Georgia Tech
Dr. Kexin Rong, School of Computer Science, Georgia Tech
Dr. Joy Arulraj, School of Computer Science, Georgia Tech
Dr. Santosh Vempala, School of Computer Science, Georgia Tech
Dr. Mitsunori Ogihara, Department of Computer Science, University of Miami
Abstract:
Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. It is an intriguing question how to build index data structures in support for efficient ANNS under various useful distance metrics. Locality sensitive hashing (LSH) is a longstanding and reputable solution approach to ANNS that offers a small index size and reduced costs for index construction and data changes. With numerous LSH approaches in the literature, however, most research has been limited to the collision probability of LSH functions, and many important aspects, such as the fundamental time complexities of evaluating these randomized LSH functions, have been overlooked for long.
My research spans both the design of new LSH approaches for ANNS under hard-to-query distance metrics and the design of new algorithms that speed up the evaluation of these LSH functions. In my dissertation, I present novel LSH-based ANNS solutions to two such distance metrics -- point-to-subspace metric in L1 (P2SL1), and ANNS-ALT (after linear transform) -- both of which are useful but currently lack efficient solutions. I also present two algorithmic techniques -- efficient range summation (ERS) and fast Gaussian orthogonal ensemble quadratic form (FGoeQF) -- that help build up LSH primitives that are faster than existing versions by orders of magnitudes. Finally, I present candidate-based density estimation (CanDE), which is an LSH-based application for getting important data analytics in the neighborhood of the query, and CommonSense, a communication-efficient protocol for computing the set intersection based on compressed sensing (a randomized algorithm deeply related to LSH).
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:07/07/2025
- Modified By:Tatianna Richardson
- Modified:07/07/2025
Categories
Keywords
Target Audience