PhD Proposal by Gaurav Verma

Title: Robust and Efficient Vision-Language Learning for Equity, Safety, and Well-Being

Date: Thursday, May 02, 2024

Time: 11:15 AM to 1:00 PM Eastern Time (US)

Location: Coda C1315

Virtual Meeting: Zoom

Gaurav Verma

https://gaurav22verma.github.io/

CS Ph.D. Student

School of Computational Science and Engineering

College of Computing

Georgia Institute of Technology

Committee:

Dr. Srijan Kumar - Advisor, Georgia Tech, Computational Science & Engineering

Dr. Munmun De Choudhury - Georgia Tech, School of Interactive Computing

Dr. Duen Horng (Polo) Chau - Georgia Tech, Computational Science & Engineering

Dr. Ani Nenkova - Adobe Research

Abstract:

The long-term goal of developing Artificial Intelligence (AI) systems is to enable broadly useful human-AI interactions for individuals, groups, and societies. The future of such AI systems is inherently multimodal, and the current shift in the landscape of AI research and development is a great illustration–powerful systems that reason over and generate vision, language, audio, and other forms of unstructured data are emerging rapidly. The robustness and efficiency of multimodal AI are imperative for enabling its widespread adoption. Furthermore, since AI tools are socio-technical systems, it is also critical that societal dimensions like equity, safety and well-being are prioritized among its applications. To this end, the objective of this thesis proposal is to evaluate and efficiently strengthen the robustness of vision-language learning to steer AI development towards enhancing language equity, online safety, and individual & public well-being.

How far are we from achieving 'three nines' reliability in multimodal AI systems? How do we get there? It is important that the underlying vision-language models that power AI applications are robust to both unintentional and intentional variations in input data. This requires systematic and efficient approaches to evaluate their robustness and overcome observed vulnerabilities. This thesis proposal presents work that develops systematic methods to evaluate the robustness of vision-language learning models to plausible changes in the input data – specifically, cross-modal dilutions and insertions. Furthermore, we propose modeling of text visualness as an efficient approach to perform text-to-image mapping in long-form content generation tasks.

Artificial Intelligence for Equity, Safety, and Well-Being. This thesis proposal aims to deliver three-pronged advances along important societal dimensions: (i) highlighting the language inequities that could be propagated by the use of language-only models and proposing vision-language learning as an approach to enable more equitable outcomes across English and non-English languages, (ii) developing AI approaches to enhance the safety of online spaces in a community-centric manner, specifically by characterizing and detecting malicious speech and actors, and (iii) using language models to discover insights that can inform policies for improving individual and public well-being.

Media

No media selected

Summary

Robust and Efficient Vision-Language Learning for Equity, Safety, and Well-Being

Details

Thursday

May 2 2024

11:15am - 01:00pm

Location: Coda C1315

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:04/26/2024
Modified By:Tatianna Richardson
Modified:04/26/2024

Mercury (Hg)