<![CDATA[PhD Proposal by Pramod Chunduri]]>

673377 event 1709745969 1709745969 <![CDATA[PhD Proposal by Pramod Chunduri]]> Title: Advanced Query Processing Systems for Unstructured Data Management

Date: Monday, March 11th, 2024

Time: 1:00 - 2:30 PM EST

Location: Klaus 1315

Virtual Link: Teams

Pramod Chunduri

(https://pchunduri6.github.io/)

Database Systems Ph.D. Student

School of Computer Science

Georgia Institute of Technology

Committee:

Dr. Joy Arulraj (Advisor) - School of Computer Science, Georgia Institute of Technology

Dr. Kexin Rong – School of Computer Science, Georgia Institute of Technology

Dr. Xu Chu – School of Computer Science, Georgia Institute of Technology

Dr. Shamkant Navathe – School of Computer Science, Georgia Institute of Technology

Abstract:

The exponential increase in unstructured data, such as video, images, audio, and text, presents significant challenges for efficient processing and analysis. While machine learning (ML), particularly deep learning (DL), has made impressive strides in developing models to handle these tasks, the practical application of these models to large-scale data is hindered by high costs, the inability to query fine-grained information, and the difficulty in selecting appropriate models for specific tasks. My thesis aims to address these challenges by developing efficient, accurate, and practical query processing systems for unstructured data management.

In this proposal, I present three query processing systems to achieve this objective. First, I present ZEUS, a video analytics system that leverages reinforcement learning to efficiently localize complex actions in videos. ZEUS rapidly localizes complex actions in videos while maintaining a user-specified accuracy. I then present SketchQL, a user-friendly, sketch-based query system that allows intuitive retrieval of fine-grained video moments. SketchQL significantly enhances the usability and accuracy of fine-grained video moment retrieval.

Finally, I propose an automated model selection framework for heterogeneous model ecosystems. In the past year, large language models (LLM) have taken giant leaps in unstructured text processing. A wide range of models are available as proprietary API-based offerings and open-source models. These models are incredibly expensive, with diverse performance profiles on user queries. Our preliminary work demonstrates that a careful model selection process can significantly cut down the query costs while reaching state-of-the-art accuracy. We aim to build a novel model routing strategy for heterogeneous LLMs that optimizes the cost, latency, and accuracy of unstructured text processing.

]]> Advanced Query Processing Systems for Unstructured Data Management

]]> <![CDATA[]]> 221981 1788 102851