event
PhD Defense by Huayi Wang
Primary tabs
Title: Approximate Nearest Neighbor Search for Multiple Distance Metrics: Algorithms, Index Structures, and Applications
Date: Wednesday July 8th
Time: 7:30PM – 9PM Eastern Time (US)
Location: Remote
Teams link: Huayi's Thesis Defense
Thesis Advisory Committee
Dr. Jun (Jim) Xu - Advisor, Georgia Tech, School of Computational Science
Dr. Kexin Rong - Georgia Tech, School of Computational Science
Dr. Joy Arulraj - Georgia Tech, School of Computational Science
Dr. Mitsunori Ogihara - University of Miami, Department of Computer Science
Dr. Gromit Yeuk-Yin Chan - Adobe Research, Research Scientist
Abstract:
Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. An intriguing question is how to build index data structures that support efficient ANNS under various useful distance metrics. However, many practically important distance metrics—such as Hamming distance, edit distance, Manhattan distance, and universal Lp distances—still lack effective index structures for efficient ANN search. Although Euclidean distance has been extensively studied, these other metrics are equally important in real applications but still lack efficient ANNS solutions.
This thesis proposes several ANNS solutions for different distance metrics. For Manhattan distance, we propose MP-RW-LSH, the first multi-probe Locality Sensitive Hashing (LSH) scheme tailored for L1 distance. Compared with the state-of-the-art LSH solution, MP-RW-LSH significantly reduces the number of hash tables required while maintaining similar query accuracy. For Hamming and edit distance, we propose indexable distance estimating codes called iDEC, which extend the error estimating coding (EEC) technique in the networking area to the ANNS problem by treating error estimating coding as a new distance estimation method. We also propose U-HNSW for universal Lp distance metrics. U-HNSW can efficiently answer ANNS queries under the Lp distance without building multiple graph indices for different p values.
Beyond ANNS algorithms, this thesis also studies applications where the core principles underlying efficient ANNS—sketching and the avoidance of unnecessary expensive computations—bring broader benefits. We propose OddEEC, a new EEC for wireless networking. OddEEC extends Odd Sketch for symmetric difference cardinality estimation to estimate the bit error rate of the communication channel, achieving much faster estimation time than existing EEC schemes while maintaining similar accuracy. Finally, this thesis studies how these principles can be applied to optimize modern AI workflows. Many AI workflows—ranging from LLM post-training pipelines to agentic reasoning tasks—can be expressed as declarative queries whose expensive predicate is evaluated by a large model or reward function. We propose a query-centric formulation of these workflows and show that classical database techniques, namely approximate query processing (AQP) and proxy-model-based filtering, can substantially reduce the number of expensive model invocations without modifying the underlying models or pipelines.
Groups
Status
- Workflow status: Published
- Created by: Tatianna Richardson
- Created: 06/25/2026
- Modified By: Tatianna Richardson
- Modified: 06/25/2026
Categories
Keywords
User Data
Target Audience