<node id="686472">
  <nid>686472</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1763388631</created>
  <changed>1763388761</changed>
  <title><![CDATA[PhD Defense by Gaurav Tarlok Kakkar]]></title>
  <body><![CDATA[<p>Dear faculty members and fellow students,</p><p>&nbsp;</p><p>You are cordially invited to my&nbsp;Ph.D.&nbsp;thesis&nbsp;defense.&nbsp;</p><p>&nbsp;</p><p><strong>Title:&nbsp;</strong>Designing ML-Centric Data Systems for Efficiency and Usability<br>&nbsp;</p><p><strong>Date:&nbsp;</strong>Friday, November 21st, 2025<br>&nbsp;</p><p><strong>Time:&nbsp;</strong>12-2 PM, EST</p><p>&nbsp;</p><p><strong>Location:&nbsp;</strong>Klaus Advanced Computing Building (KACB), Room 1212</p><p>&nbsp;</p><p>Gaurav Tarlok Kakkar</p><p>Computer Science Ph.D. Student</p><p>School of Computer Science<br>Georgia Institute of Technology</p><p>&nbsp;</p><p><strong>Committee:</strong></p><ol><li>Dr. Joy Arulraj (Advisor), School of Computer Science, Georgia Tech</li><li>Dr. Sham Navathe, School of Computer Science, Georgia Tech</li><li>Dr. Kexin Rong, School of Computer Science, Georgia Tech</li><li>Dr. Steve Mussmann, School of Computer Science, Georgia Tech</li><li>Dr. Fatma Özcan, Google System Research</li></ol><p>&nbsp;</p><p><strong>Abstract:</strong></p><p>Over the past six decades, relational databases have been remarkably successful in managing structured data. However, the growing demand for analytics over unstructured data, such as videos, images, and text, driven by modern machine learning (ML) workloads exposes fundamental limitations in traditional database systems. Bridging this gap requires a new class of data systems that treat ML models as first-class citizens, integrating them directly into the query engine and providing optimizations tailored for their unique characteristics.</p><p>&nbsp;</p><p>This dissertation presents the design, implementation, and evaluation of techniques that form the foundation of ML-centric data management systems. It introduces four systems, EVA, Seiden, Aero, and PRISM, that collectively address challenges of efficiency and usability across multimodal workloads.</p><p>&nbsp;</p><p>EVA accelerates exploratory video analytics by automatically materializing and reusing the results of expensive user-defined functions (UDFs) through a symbolic reuse framework. Seiden revisits the “proxy model” assumption in visual databases and demonstrates that indexing directly with oracle models and exploration–exploitation sampling delivers superior execution performance and query accuracy. Aero extends adaptive query processing (AQP) to ML workloads by using runtime feedback to reorder predicates and dynamically scale resources, achieving performance improvements over static optimizers. Finally, PRISM optimizes natural language to SQL (NL2SQL) pipelines by treating monetary cost as a first-class objective and systematically navigating the trade-off between accuracy and cost.</p><p>&nbsp;</p><p>Together, these contributions lay the foundation for the next generation of data systems designed for AI-driven workloads.</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Designing ML-Centric Data Systems for Efficiency and Usability]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>Designing ML-Centric Data Systems for Efficiency and Usability</p>]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2025-11-21T12:00:00-05:00]]></value>
      <value2><![CDATA[2025-11-21T14:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Klaus Advanced Computing Building (KACB), Room 1212]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
