PhD Proposal by Pramod Chunduri
Primary tabs
Title: Advanced Query Processing Systems for Unstructured Data Management
Date: Monday, March 11th, 2024
Time: 1:00 - 2:30 PM EST
Location: Klaus 1315
Virtual Link: Teams
Pramod Chunduri
(https://pchunduri6.github.io/)
Database Systems Ph.D. Student
School of Computer Science
Georgia Institute of Technology
Committee:
Dr. Joy Arulraj (Advisor) - School of Computer Science, Georgia Institute of Technology
Dr. Kexin Rong – School of Computer Science, Georgia Institute of Technology
Dr. Xu Chu – School of Computer Science, Georgia Institute of Technology
Dr. Shamkant Navathe – School of Computer Science, Georgia Institute of Technology
Abstract:
The exponential increase in unstructured data, such as video, images, audio, and text, presents significant challenges for efficient processing and analysis. While machine learning (ML), particularly deep learning (DL), has made impressive strides in developing models to handle these tasks, the practical application of these models to large-scale data is hindered by high costs, the inability to query fine-grained information, and the difficulty in selecting appropriate models for specific tasks. My thesis aims to address these challenges by developing efficient, accurate, and practical query processing systems for unstructured data management.
In this proposal, I present three query processing systems to achieve this objective. First, I present ZEUS, a video analytics system that leverages reinforcement learning to efficiently localize complex actions in videos. ZEUS rapidly localizes complex actions in videos while maintaining a user-specified accuracy. I then present SketchQL, a user-friendly, sketch-based query system that allows intuitive retrieval of fine-grained video moments. SketchQL significantly enhances the usability and accuracy of fine-grained video moment retrieval.
Finally, I propose an automated model selection framework for heterogeneous model ecosystems. In the past year, large language models (LLM) have taken giant leaps in unstructured text processing. A wide range of models are available as proprietary API-based offerings and open-source models. These models are incredibly expensive, with diverse performance profiles on user queries. Our preliminary work demonstrates that a careful model selection process can significantly cut down the query costs while reaching state-of-the-art accuracy. We aim to build a novel model routing strategy for heterogeneous LLMs that optimizes the cost, latency, and accuracy of unstructured text processing.
Groups
Status
Categories
Keywords
Target Audience