event
PhD Defense by Gaurav Verma
Primary tabs
Title: Robust, Efficient, and Adaptable Multimodal AI for Vertical Applications
Date: Thursday, April 10, 2025
Time: 2:30–4:30 PM Eastern Time (US)
Location: Coda 1315 (Grant Park)
Virtual Meeting: Zoom
Gaurav Verma
https://gaurav22verma.github.io/
Computer Science PhD Candidate
School of Computational Science and Engineering
Georgia Institute of Technology
Committee:
Dr. Srijan Kumar - Advisor, Georgia Tech, Computational Science & Engineering
Dr. Munmun De Choudhury - Georgia Tech, School of Interactive Computing
Dr. Duen Horng (Polo) Chau - Georgia Tech, Computational Science & Engineering
Dr. Chao Zhang - Georgia Tech, Computational Science & Engineering
Dr. Ani Nenkova - Adobe Research, Document Intelligence Lab
Abstract:
Large artificial intelligence (AI) models have garnered attention for their impressive, sometimes superhuman, performance on benchmarks, yet their practical adoption in verticals like web safety and well-being presents many challenges. Issues such as brittleness to realistic input variations, sensitivity to prompt formatting in large language models (LLMs), performance degradation in specialized settings, and limited effectiveness among certain user groups significantly limit the real-world utility of large AI models.
To systematically address these challenges, this thesis introduces a framework for transforming foundational large AI models into real-world solutions by advancing their vertical-agnostic properties and overcoming challenges in vertical-specific applications. Focusing first on vertical-agnostic properties, the thesis advances multimodal AI models—those integrating vision and language—by tackling three critical areas: robustness to realistic data variations, efficient cross-modal mapping, and adaptability to novel tasks. Key contributions include evaluating model robustness to grounded multimodal variations, proposing a method to quantify text visualness for efficient cross-modal retrieval and generation, and developing techniques for rapidly adapting multimodal agents to custom workflows.
Building upon these foundational contributions, the thesis then targets vertical applications, demonstrating the necessity of tailored data, modeling, and evaluation approaches. In collaboration with domain experts in web safety and well-being, it characterizes and detects violence-provoking speech and leverages LLMs to uncover actionable mental well-being insights. Further, it reveals how insights from vertical-specific studies can loop back to improve large AI models, notably by addressing inequities across languages through multimodal learning. The thesis also underscores user interfacing with the emerging capabilities of large AI models as a vital and open area for future exploration.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:03/31/2025
- Modified By:Tatianna Richardson
- Modified:03/31/2025
Categories
Keywords
Target Audience