PhD Defense by Gaurav Tarlok Kakkar

Dear faculty members and fellow students,

You are cordially invited to my Ph.D. thesis defense.

Title: Designing ML-Centric Data Systems for Efficiency and Usability

Date: Friday, November 21st, 2025

Time: 12-2 PM, EST

Location: Klaus Advanced Computing Building (KACB), Room 1212

Gaurav Tarlok Kakkar

Computer Science Ph.D. Student

School of Computer Science
Georgia Institute of Technology

Committee:

Dr. Joy Arulraj (Advisor), School of Computer Science, Georgia Tech
Dr. Sham Navathe, School of Computer Science, Georgia Tech
Dr. Kexin Rong, School of Computer Science, Georgia Tech
Dr. Steve Mussmann, School of Computer Science, Georgia Tech
Dr. Fatma Özcan, Google System Research

Abstract:

Over the past six decades, relational databases have been remarkably successful in managing structured data. However, the growing demand for analytics over unstructured data, such as videos, images, and text, driven by modern machine learning (ML) workloads exposes fundamental limitations in traditional database systems. Bridging this gap requires a new class of data systems that treat ML models as first-class citizens, integrating them directly into the query engine and providing optimizations tailored for their unique characteristics.

This dissertation presents the design, implementation, and evaluation of techniques that form the foundation of ML-centric data management systems. It introduces four systems, EVA, Seiden, Aero, and PRISM, that collectively address challenges of efficiency and usability across multimodal workloads.

EVA accelerates exploratory video analytics by automatically materializing and reusing the results of expensive user-defined functions (UDFs) through a symbolic reuse framework. Seiden revisits the “proxy model” assumption in visual databases and demonstrates that indexing directly with oracle models and exploration–exploitation sampling delivers superior execution performance and query accuracy. Aero extends adaptive query processing (AQP) to ML workloads by using runtime feedback to reorder predicates and dynamically scale resources, achieving performance improvements over static optimizers. Finally, PRISM optimizes natural language to SQL (NL2SQL) pipelines by treating monetary cost as a first-class objective and systematically navigating the trade-off between accuracy and cost.

Together, these contributions lay the foundation for the next generation of data systems designed for AI-driven workloads.

Media

No media selected

Summary

Designing ML-Centric Data Systems for Efficiency and Usability

Details

Friday

Nov 21 2025

12:00pm - 02:00pm

Location: Klaus Advanced Computing Building (KACB), Room 1212

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow status: Published
Created by: Tatianna Richardson
Created: 11/17/2025
Modified By: Tatianna Richardson
Modified: 11/17/2025

Mercury (Hg)

PhD Defense by Gaurav Tarlok Kakkar

Log in

Georgia Institute of Technology

PhD Defense by Gaurav Tarlok Kakkar

Primary tabs

Log in

Georgia Institute of Technology