<![CDATA[PhD Defense by Yang Chen]]>

675369 event 1720209446 1720209446 <![CDATA[PhD Defense by Yang Chen]]> Title: Extracting Knowledge with Multimodal and Multilingual LLMs

Date/Time: July 12th, 2024, 3:00 PM to 5:00 PM EST [12-2PM PST]

Location: Coda C1115 Druid Hills

Zoom: https://gatech.zoom.us/j/99753876757)

Yang Chen (Homepage)

Ph.D. Candidate in Computer Science

School of Interactive Computing

Georgia Institute of Technology

Committee:

Dr. Alan Ritter (advisor), School of Interactive Computing, Georgia Tech

Dr. Wei Xu (co-advisor), School of Interactive Computing, Georgia Tech

Dr. Kartik Goyal, School of Interactive Computing, Georgia Tech

Dr. Hexiang (Frank) Hu, Google Deepmind

Dr. Ming-Wei Chang, Google Deepmind

Abstract:

Large language models (LLMs) have revolutionized natural language processing by learning vast amounts of knowledge from online text corpora. These models can utilize pre-trained knowledge to perform a wide range of tasks, and recent advancements have expanded their capabilities to include vision-language inputs. However, extracting and utilizing knowledge from these multimodal and multilingual LLMs presents several challenges. These include accurately benchmarking visual world knowledge, addressing privacy concerns related to memorized personal information, and effectively extracting textual knowledge from multilingual corpora, particularly for low-resource languages. This thesis addresses these challenges by developing and benchmarking methods to extract knowledge with multimodal and multilingual LLMs.

In this presentation, I will first introduce a visual info-seeking benchmark called InfoSeek, designed to evaluate visual knowledge in multimodal LLMs. Using InfoSeek, I will demonstrate how multimodal LLMs fine-tuned on a training set can elicit pre-trained knowledge and generalize to unseen entities. Additionally, I will show how a retrieval-based system can improve accuracy by accessing external resources such as Wikipedia. Building on these findings, I will then discuss an emergent privacy concern related to the deployment of state-of-the-art multimodal LLMs, particularly their potential to extract private user information from social media posts with geolocation capabilities.

The second part of the presentation will focus on extracting knowledge from multilingual corpora, with a particular emphasis on low-resource languages. I will introduce TransFusion, a learning framework that leverages translation models to enhance LLM performance on low-resource language tasks. Our experiments demonstrate improvements in African named entity recognition across various settings, including instruction-tuning, prompting, and supervised fine-tuning. Finally, I will present EasyProject, a crucial component in generating annotated information extraction data across multiple languages using a fine-tuned translation model.

]]> Extracting Knowledge with Multimodal and Multilingual LLMs

]]> <![CDATA[]]> 221981 1788 100811