event
PhD Defense by Gaurav Tarlok Kakkar
Primary tabs
Dear faculty members and fellow students,
You are cordially invited to my Ph.D. thesis defense.
Title: Designing ML-Centric Data Systems for Efficiency and Usability
Date: Friday, November 21st, 2025
Time: 12-2 PM, EST
Location: Klaus Advanced Computing Building (KACB), Room 1212
Gaurav Tarlok Kakkar
Computer Science Ph.D. Student
School of Computer Science
Georgia Institute of Technology
Committee:
- Dr. Joy Arulraj (Advisor), School of Computer Science, Georgia Tech
- Dr. Sham Navathe, School of Computer Science, Georgia Tech
- Dr. Kexin Rong, School of Computer Science, Georgia Tech
- Dr. Steve Mussmann, School of Computer Science, Georgia Tech
- Dr. Fatma Özcan, Google System Research
Abstract:
Over the past six decades, relational databases have been remarkably successful in managing structured data. However, the growing demand for analytics over unstructured data, such as videos, images, and text, driven by modern machine learning (ML) workloads exposes fundamental limitations in traditional database systems. Bridging this gap requires a new class of data systems that treat ML models as first-class citizens, integrating them directly into the query engine and providing optimizations tailored for their unique characteristics.
This dissertation presents the design, implementation, and evaluation of techniques that form the foundation of ML-centric data management systems. It introduces four systems, EVA, Seiden, Aero, and PRISM, that collectively address challenges of efficiency and usability across multimodal workloads.
EVA accelerates exploratory video analytics by automatically materializing and reusing the results of expensive user-defined functions (UDFs) through a symbolic reuse framework. Seiden revisits the “proxy model” assumption in visual databases and demonstrates that indexing directly with oracle models and exploration–exploitation sampling delivers superior execution performance and query accuracy. Aero extends adaptive query processing (AQP) to ML workloads by using runtime feedback to reorder predicates and dynamically scale resources, achieving performance improvements over static optimizers. Finally, PRISM optimizes natural language to SQL (NL2SQL) pipelines by treating monetary cost as a first-class objective and systematically navigating the trade-off between accuracy and cost.
Together, these contributions lay the foundation for the next generation of data systems designed for AI-driven workloads.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:11/17/2025
- Modified By:Tatianna Richardson
- Modified:11/17/2025
Categories
Keywords
Target Audience