event

PhD Proposal by Huayi Wang

Primary tabs

Title: Approximate Nearest Neighbor Search for Multiple Distance Metrics: Algorithms, Index Structures, and Applications

 

Date: Monday, Dec. 1, 2025

Time: 11:30AM – 1PM Eastern Time (US)

Location: Klaus Building 3402

Teams link:  Huayi’s PhD proposal | Meeting-Join | Microsoft Teams

 

 

Committee

Dr. Jun (Jim) Xu - Advisor, Georgia Tech, School of Computational Science

Dr. Kexin Rong - Georgia Tech, School of Computational Science

Dr. Joy Arulraj- Georgia Tech, School of Computational Science

 

Abstract:

Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. An intriguing question is how to build index data structures that support efficient ANNS under various useful distance metrics. However, many practically important distance metrics—such as Hamming distance, edit distance, Manhattan distance, and universal distances—still lack effective index structures for efficient ANN search. Although Euclidean distance has been extensively studied, these other metrics are equally important in real applications but still lack efficient ANNS solutions.

 

This thesis proposes several ANNS solutions for different distance metrics. For Manhattan distance, we propose MP-RW-LSH, the first multi-probe Locality Sensitive Hashing (LSH) scheme tailored for distance. Compared with the state-of-the-art LSH solution, MP-RW-LSH significantly reduces the number of hash tables required while maintaining similar query accuracy. For Hamming and edit distance, we propose indexable distance estimating codes called iDEC, which extend the error estimating coding (EEC) technique in the networking area to the ANNS problem by treating error estimating coding as a new distance estimation method. We also propose U-HNSW for universal Lp distance metrics. U-HNSW can efficiently answer ANNS queries under the Lp distance without building multiple graph indices for different p values. Beyond ANNS algorithms, this thesis also studies applications where approximate distance estimation used in ANNS algorithms brings benefits. We propose OddEEC, a new EEC for wireless networking. OddEEC extends Odd Sketch for symmetric difference cardinality estimation to estimate the bit error rate of the communication channel, achieving much faster estimation time than existing EEC schemes while maintaining similar accuracy. Finally, this thesis studies how ANN principles can help reduce the computational cost of the LLM reward model. By training a decision tree to approximate reward-model scores, the proposed system can select promising candidates without invoking the reward model on every possible response.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/20/2025
  • Modified By:Tatianna Richardson
  • Modified:11/20/2025

Categories

Keywords

Target Audience