event
PhD Defense by Nikolai Warner
Primary tabs
Title: Improving Out-of-Distribution Generalization in Human-Centric Multimodal Vision
Date: Monday, June 22, 2026
Time: 1:00 - 3:00 PM ET
Location: Coda C0915 Atlantic + Remote (https://teams.microsoft.com/meet/250438047509225?p=kzLrPnM2Ap8Ny0Dq9t)
Meeting ID: 250 438 047 509 225 | Passcode: 2Tn3Us6F
Nikolai Warner
Robotics Ph.D. Candidate
George W. Woodruff School of Mechanical Engineering
Georgia Institute of Technology
Committee
Dr. Irfan Essa (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. Thomas Ploetz - School of Interactive Computing, Georgia Institute of Technology
Dr. Zsolt Kira - School of Interactive Computing, Georgia Institute of Technology
Dr. Judy Hoffman - School of Interactive Computing, Georgia Institute of Technology
Dr. Apaar Sadhwani - Amazon
Abstract
Despite steady in-distribution progress on human-centric vision tasks and the emergence of powerful foundation models, in-the-wild and out-of-distribution performance still lags. This dissertation studies four such tasks (interactive segmentation, non-rigid image editing, 3D human pose estimation, and motion-language alignment) and traces their out-of-distribution gap to two distinct failures: a signal-side failure, where the input modality is ill-posed for the task, and a noise-side failure, where the supervision channel carries distribution-specific nuisance. On the signal side, DAISeg enriches click-conditioned segmentation with an open-vocabulary saliency channel (from +3 mIoU on seen classes up to +10.5 on unseen, beating SAM under text-conditioned clicks), and AugLift hands the 2D-to-3D lifter a per-joint depth lower bound (−8.9% OOD MPJPE across four architectures, plus cross-dataset SOTA when combined with DG techniques). On the noise side, IPC-Edit constructs supervision that had no public equivalent, filtering and composing three noisy proxies into a 13.5K-pair corpus for identity-preserving non-rigid editing (68.5% identity preservation vs. 61%), while MoCHA denoises supervision that already exists, distilling an LLM canonicalization operator that strips annotator style from captions and setting a new cross-distribution SOTA (T2M R@1 from 13.74 to 26.59, +94%).
Groups
Status
- Workflow status: Published
- Created by: Tatianna Richardson
- Created: 06/17/2026
- Modified By: Tatianna Richardson
- Modified: 06/17/2026
Categories
Keywords
User Data
Target Audience