<node id="686562">
  <nid>686562</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1763674149</created>
  <changed>1763675013</changed>
  <title><![CDATA[PhD Proposal by Huayi Wang]]></title>
  <body><![CDATA[<p><strong>Title</strong>: Approximate Nearest Neighbor Search for Multiple Distance Metrics: Algorithms, Index Structures, and Applications</p><p>&nbsp;</p><p><strong>Date</strong>: Monday, Dec. 1, 2025</p><p><strong>Time</strong>: 11:30AM – 1PM Eastern Time (US)</p><p><strong>Location</strong>: Klaus Building 3402</p><p><strong>Teams link</strong>: &nbsp;<a href="https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZTFlZjkwMzAtOTFjYi00YmE1LWFiOWMtODEyY2QwZjgxNWNj%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%222464cae1-fe5f-452b-a77d-1df702edeab7%22%7d" target="_blank">Huayi’s PhD proposal | Meeting-Join | Microsoft Teams</a></p><p>&nbsp;</p><p>&nbsp;</p><p><strong>Committee</strong></p><p><strong>Dr. Jun (Jim) Xu</strong>&nbsp;- Advisor, Georgia Tech, School of&nbsp;Computational Science</p><p><strong>Dr. Kexin Rong</strong>&nbsp;- Georgia Tech, School of Computational Science</p><p><strong>Dr. Joy Arulraj</strong>- Georgia Tech, School of Computational Science</p><p>&nbsp;</p><p><strong>Abstract：</strong></p><p>Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. An intriguing question is how to build index data structures that support efficient ANNS under various useful distance metrics. However, many practically important distance metrics—such as Hamming distance, edit distance, Manhattan distance, and universal <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABkAAAAfCAMAAAAlbpZMAAAAAXNSR0IArs4c6QAAAFpQTFRFAAAAAAAAAAA6AABmADqQAGa2OgAAOjoAOmaQOpDbZgAAZjoAZma2ZpCQZrbbZrb/kDoAkNv/tmYAttv/tv//25A625Bm27Zm2////7Zm/9uQ/9u2//+2///bhE3wrgAAAAF0Uk5TAEDm2GYAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAAZdEVYdFNvZnR3YXJlAE1pY3Jvc29mdCBPZmZpY2V/7TVxAAAAoklEQVQ4T9WSXRLCIAyECdZo8actSoW03P+ahrQzzmhyAPeFhwR2vx2c+1cVYA1q+hfAXccqcHjqkwRHo4kIvT6po5/0CaFlU6DLVgDTxsB0a7AD2DYbZ/zBjZsNIR8z+mm5QCdl7TZLYKi5J7xdc8EWljd2yWOEfKxB6TG1OkjufCm2nEUBlHfqqGCwTa5RI0/+BHB+KP2KjSY1riwSfn7EG5hqBxQuvOhHAAAAAElFTkSuQmCC" width="17" height="21"> distances—still lack effective index structures for efficient ANN search. Although Euclidean distance has been extensively studied, these other metrics are equally important in real applications but still lack efficient ANNS solutions.</p><p>&nbsp;</p><p>This thesis proposes several ANNS solutions for different distance metrics. For Manhattan distance, we propose&nbsp;MP-RW-LSH, the first multi-probe Locality Sensitive Hashing (LSH) scheme tailored for <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABcAAAAcCAMAAAC9M9RRAAAAAXNSR0IArs4c6QAAAFFQTFRFAAAAAAAAAAA6AABmADqQAGa2OgAAOgA6Oma2OpDbZgAAZjoAZrbbZrb/kDoAkNv/tmYAtv//25A627Zm27aQ2////7Zm/9uQ/9u2//+2///bVrOocgAAAAF0Uk5TAEDm2GYAAAAJcEhZcwAAFiUAABYlAUlSJPAAAAAZdEVYdFNvZnR3YXJlAE1pY3Jvc29mdCBPZmZpY2V/7TVxAAAAg0lEQVQoU9WR3RJAIBSEOyjkL6Lw/g8qkphZd26cm2a+ObNnd2PsP2PITQP8aqIaxTCU9ogPxGFqRSXia5t0iFuB5Q1lEz77Ig9DsUXez+o8eLAiyo9VjBhSKc5MMQ2XN+XlrTiei5/ys/SLgc/V3vI+vqOo84z0EXfdgq9b5OHA1bUBf+0FDGY3GCYAAAAASUVORK5CYII=" width="15" height="19"> distance. Compared with the state-of-the-art LSH solution, MP-RW-LSH significantly reduces the number of hash tables required while maintaining similar query accuracy. For Hamming and edit distance, we propose indexable distance estimating codes called&nbsp;iDEC, which extend the error estimating coding (EEC) technique in the networking area to the ANNS problem&nbsp;by treating error estimating coding as a new distance estimation method. We also propose&nbsp;U-HNSW&nbsp;for universal Lp&nbsp;distance metrics. U-HNSW can efficiently answer ANNS&nbsp;queries under the Lp distance without building multiple graph indices for different&nbsp;p&nbsp;values. Beyond ANNS algorithms, this thesis also studies applications where approximate distance estimation used in ANNS algorithms brings benefits. We propose&nbsp;OddEEC, a new EEC for wireless networking. OddEEC extends Odd Sketch for symmetric difference cardinality estimation to estimate the bit error rate of the communication channel, achieving much faster estimation time than existing EEC schemes while maintaining similar accuracy. Finally, this thesis studies how ANN principles can help reduce the computational cost of the LLM reward model.&nbsp;By training a decision tree to approximate reward-model scores, the proposed system can select promising candidates without invoking the reward model on every possible response.</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Approximate Nearest Neighbor Search for Multiple Distance Metrics: Algorithms, Index Structures, and Applications]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>Approximate Nearest Neighbor Search for Multiple Distance Metrics: Algorithms, Index Structures, and Applications</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2025-12-01T11:30:00-05:00]]></value>
      <value2><![CDATA[2025-12-01T13:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Klaus Building 3402]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>102851</tid>
        <value><![CDATA[Phd proposal]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
