PhD Defense by Chia-Wen Kuo

Primary tabs

You are cordially invited to attend my dissertation defense on Wednesday, November 29th.


  • Title: Knowledge-Augmented Vision-and-Language Assistant
  • Date: Wednesday, November 29th, 2023
  • Time: 11:00 AM - 12:30 PM PST
  • Location: this zoom link


Chia-Wen Kuo

Robotics PhD Candidate

School of Electrical and Computer Engineering

Georgia Institute of Technology



Dr. Zsolt Kira (Advisor) - School of Interactive Computing, Georgia Institute of Technology

Dr. Chao Zhang - School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Chunyuan Li - Principal Research Scientist, Microsoft Research

Dr. Judy Hoffman - School of Interactive Computing, Georgia Institute of Technology

Dr. Larry Heck - School of Electrical and Computer Engineering, Georgia Institute of Technology



The fusion of vision and language (VL) in artificial intelligence represents a crucial advancement in the creation of truly intelligent systems, echoing a fundamental aspect of human cognition: the ability to see and articulate the world. This integration has transformative potential across various sectors, notably enhancing human interaction with technology. However, developing effective VL models is challenging due to often incomplete or missing knowledge in both vision and language components. This limitation impacts the models' ability to accurately describe visual contents and answer complex, real-world questions. My research, presented in a series of three works, addresses these challenges. The first work, Xmodal-Ctx, introduces external knowledge into VL models to overcome their contextual limitations. The second, HAAV, expands this by integrating a diverse array of knowledge sources, enhancing the models' understanding of visual content. The final work, K-Aug, scales these concepts to larger, more complex multimodal models, addressing the integration and application of high-quality knowledge sources. This structured approach aims to bridge the knowledge gaps in VL models, thereby enhancing their overall interpretative and descriptive capabilities in a context-rich and linguistically coherent manner.



  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/27/2023
  • Modified By:Tatianna Richardson
  • Modified:11/27/2023



Target Audience