event
PhD Defense by Yang Chen
Primary tabs
Title: Extracting Knowledge with Multimodal and Multilingual LLMs
Date/Time: July 12th, 2024, 3:00 PM to 5:00 PM EST [12-2PM PST]
Location: Coda C1115 Druid Hills
Zoom: https://gatech.zoom.us/j/99753876757)
Yang Chen (Homepage)
Ph.D. Candidate in Computer Science
School of Interactive Computing
Georgia Institute of Technology
Committee:
Dr. Alan Ritter (advisor), School of Interactive Computing, Georgia Tech
Dr. Wei Xu (co-advisor), School of Interactive Computing, Georgia Tech
Dr. Kartik Goyal, School of Interactive Computing, Georgia Tech
Dr. Hexiang (Frank) Hu, Google Deepmind
Dr. Ming-Wei Chang, Google Deepmind
Abstract:
Large language models (LLMs) have revolutionized natural language processing by learning vast amounts of knowledge from online text corpora. These models can utilize pre-trained knowledge to perform a wide range of tasks, and recent advancements have expanded their capabilities to include vision-language inputs. However, extracting and utilizing knowledge from these multimodal and multilingual LLMs presents several challenges. These include accurately benchmarking visual world knowledge, addressing privacy concerns related to memorized personal information, and effectively extracting textual knowledge from multilingual corpora, particularly for low-resource languages. This thesis addresses these challenges by developing and benchmarking methods to extract knowledge with multimodal and multilingual LLMs.
In this presentation, I will first introduce a visual info-seeking benchmark called InfoSeek, designed to evaluate visual knowledge in multimodal LLMs. Using InfoSeek, I will demonstrate how multimodal LLMs fine-tuned on a training set can elicit pre-trained knowledge and generalize to unseen entities. Additionally, I will show how a retrieval-based system can improve accuracy by accessing external resources such as Wikipedia. Building on these findings, I will then discuss an emergent privacy concern related to the deployment of state-of-the-art multimodal LLMs, particularly their potential to extract private user information from social media posts with geolocation capabilities.
The second part of the presentation will focus on extracting knowledge from multilingual corpora, with a particular emphasis on low-resource languages. I will introduce TransFusion, a learning framework that leverages translation models to enhance LLM performance on low-resource language tasks. Our experiments demonstrate improvements in African named entity recognition across various settings, including instruction-tuning, prompting, and supervised fine-tuning. Finally, I will present EasyProject, a crucial component in generating annotated information extraction data across multiple languages using a fine-tuned translation model.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:07/05/2024
- Modified By:Tatianna Richardson
- Modified:07/05/2024
Categories
Keywords
Target Audience