PhD Proposal by Ruohao Guo

Title: Rethinking Safety of Language Models in Interaction

Date: Friday, April 3rd 2026

Time: 9:00 AM – 11:00 AM EST

Location: Coda C1108 Brookhaven

Zoom: https://gatech.zoom.us/j/93471413440

Ruohao Guo

Ph.D. Student

School of Interactive Computing

Georgia Institute of Technology

Committee members

Dr. Alan Ritter (advisor): School of Interactive Computing, Georgia Institute of Technology

Dr. Wei Xu: School of Interactive Computing, Georgia Institute of Technology

Dr. Polo Chau: School of Computational Science & Engineering, Georgia Institute of Technology

Dr. Dan Roth: Department of Computer and Information Science, University of Pennsylvania; Chief AI Scientist at Oracle

Abstract

The rapid advancement of large language models (LLMs) has brought transformative capabilities but has simultaneously introduced critical safety concerns. Prior efforts in AI safety have focused on explicit and direct threats, such as overtly false claims or single-turn attacks. This thesis demonstrates that real-world safety challenges are far more subtle and dynamic, and that current safety mechanisms are inadequate against them. First, I will present our work that studies how LLMs handle implicit misinformation, i.e., the false claims embedded as unchallenged premises in user queries. We reveal that LLMs can reinforce users' misinformed beliefs through interaction, and possessing the factual knowledge alone does not suffice for effective mitigation. Second, I will introduce DialTree, an on-policy reinforcement learning framework that discovers LLMs safety vulnerabilities under multi-turn interactive scenarios. We show that even the most safety-aligned frontier models can be jailbroken by our adaptive and strategic attacks. Third, I develop a meta-tuning approach for generalizable language style understanding, which can improve the foundational capability for safety-relevant tasks such as bias detection and manipulation recognition. Finally, I will briefly discuss my ongoing work on improving safety in multi-turn settings via monitoring evolving trajectories.

Media

No media selected

Summary

Rethinking Safety of Language Models in Interaction

Details

Friday

Apr 3 2026

09:00am - 11:00am

Location: Coda C1108 Brookhaven

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow status: Published
Created by: Tatianna Richardson
Created: 03/31/2026
Modified By: Tatianna Richardson
Modified: 03/31/2026

Mercury (Hg)