event

PhD Proposal by Shuo Cheng

Primary tabs

Title: Robot Skill Representations for Long-Horizon Task Learning and Deployment

 

Shuo Cheng

Ph.D. Student in Computer Science

School of Interactive Computing 

Georgia Institute of Technology 

https://sites.google.com/view/shuocheng

 

 

Date: Monday, Nov 17th, 2025

Time: 11:00 AM -  12:30 PM EST

Location: Klaus 1212, Zoom Link

 

Committee: 

Dr. Danfei Xu (Advisor) – School of Interactive Computing, Georgia Institute of Technology

Dr. Sehoon Ha – School of Interactive Computing, Georgia Institute of Technology

Dr. Harish Ravichandar – School of Interactive Computing, Georgia Institute of Technology

 

Abstract

Robots must acquire versatile and generalizable skills to operate effectively in complex, unstructured environments. This thesis investigates how to enable such capabilities through two complementary directions: (1) structural skill representations for scalable and generalizable learning, and (2) data generation and co-training frameworks for learning reactive, deployable policies. The first part develops representations that embed structural inductive biases to support efficient skill acquisition and reuse. LEAGUE, which progressively learns neuro-symbolic reinforcement learning policies within task and motion planning systems, facilitates compositional skill learning over long horizons and earned a Best Paper Honorable Mention at IEEE RA-L. NOD-TAMP enables one-shot skill adaptation across novel geometries and configurations through learned neural object descriptors. The second part introduces robotic data generation and policy distillation frameworks for large-scale, cross-domain learning. OT-Sim2Real leverages the learned skill representations to synthesize diverse simulation demonstrations and employs a co-training strategy that improves policy generalization beyond real-world data coverage—bridging the gap between simulation and deployment.

 

Proposed Work: Building on these foundations, future research will investigate how large-scale human videos can be leveraged to learn composable motion priors that enable scalable learning of robot manipulation systems that generalize across diverse tasks and embodiments. A factor graph representation is proposed to decompose the manipulation process into structured subcomponents, facilitating efficient learning of motion samplers from human videos. In addition, a diffusion-based steering strategy will be explored to guide the sampling process toward trajectories that satisfy task-specific constraints effectively and efficiently.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:11/06/2025
  • Modified By:Tatianna Richardson
  • Modified:11/06/2025

Categories

Keywords

Target Audience