event
PhD Proposal by Huayi Wang
Primary tabs
Title: Approximate Nearest Neighbor Search for Multiple Distance Metrics: Algorithms, Index Structures, and Applications
Date: Monday, Dec. 1, 2025
Time: 11:30AM – 1PM Eastern Time (US)
Location: Klaus Building 3402
Teams link: Huayi’s PhD proposal | Meeting-Join | Microsoft Teams
Committee
Dr. Jun (Jim) Xu - Advisor, Georgia Tech, School of Computational Science
Dr. Kexin Rong - Georgia Tech, School of Computational Science
Dr. Joy Arulraj- Georgia Tech, School of Computational Science
Abstract:
Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. An intriguing question is how to build index data structures that support efficient ANNS under various useful distance metrics. However, many practically important distance metrics—such as Hamming distance, edit distance, Manhattan distance, and universal distances—still lack effective index structures for efficient ANN search. Although Euclidean distance has been extensively studied, these other metrics are equally important in real applications but still lack efficient ANNS solutions.
This thesis proposes several ANNS solutions for different distance metrics. For Manhattan distance, we propose MP-RW-LSH, the first multi-probe Locality Sensitive Hashing (LSH) scheme tailored for distance. Compared with the state-of-the-art LSH solution, MP-RW-LSH significantly reduces the number of hash tables required while maintaining similar query accuracy. For Hamming and edit distance, we propose indexable distance estimating codes called iDEC, which extend the error estimating coding (EEC) technique in the networking area to the ANNS problem by treating error estimating coding as a new distance estimation method. We also propose U-HNSW for universal Lp distance metrics. U-HNSW can efficiently answer ANNS queries under the Lp distance without building multiple graph indices for different p values. Beyond ANNS algorithms, this thesis also studies applications where approximate distance estimation used in ANNS algorithms brings benefits. We propose OddEEC, a new EEC for wireless networking. OddEEC extends Odd Sketch for symmetric difference cardinality estimation to estimate the bit error rate of the communication channel, achieving much faster estimation time than existing EEC schemes while maintaining similar accuracy. Finally, this thesis studies how ANN principles can help reduce the computational cost of the LLM reward model. By training a decision tree to approximate reward-model scores, the proposed system can select promising candidates without invoking the reward model on every possible response.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:11/20/2025
- Modified By:Tatianna Richardson
- Modified:11/20/2025
Categories
Keywords
Target Audience