event
PhD Defense by Chia-Wen Kuo
Primary tabs
You are cordially invited to attend my dissertation defense on Wednesday, November 29th.
- Title: Knowledge-Augmented Vision-and-Language Assistant
- Date: Wednesday, November 29th, 2023
- Time: 11:00 AM - 12:30 PM PST
- Location: this zoom link
Chia-Wen Kuo
Robotics PhD Candidate
School of Electrical and Computer Engineering
Georgia Institute of Technology
Committee:
Dr. Zsolt Kira (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. Chao Zhang - School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Chunyuan Li - Principal Research Scientist, Microsoft Research
Dr. Judy Hoffman - School of Interactive Computing, Georgia Institute of Technology
Dr. Larry Heck - School of Electrical and Computer Engineering, Georgia Institute of Technology
Abstract:
The fusion of vision and language (VL) in artificial intelligence represents a crucial advancement in the creation of truly intelligent systems, echoing a fundamental aspect of human cognition: the ability to see and articulate the world. This integration has transformative potential across various sectors, notably enhancing human interaction with technology. However, developing effective VL models is challenging due to often incomplete or missing knowledge in both vision and language components. This limitation impacts the models' ability to accurately describe visual contents and answer complex, real-world questions. My research, presented in a series of three works, addresses these challenges. The first work, Xmodal-Ctx, introduces external knowledge into VL models to overcome their contextual limitations. The second, HAAV, expands this by integrating a diverse array of knowledge sources, enhancing the models' understanding of visual content. The final work, K-Aug, scales these concepts to larger, more complex multimodal models, addressing the integration and application of high-quality knowledge sources. This structured approach aims to bridge the knowledge gaps in VL models, thereby enhancing their overall interpretative and descriptive capabilities in a context-rich and linguistically coherent manner.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:11/27/2023
- Modified By:Tatianna Richardson
- Modified:11/27/2023
Categories
Keywords
Target Audience