PhD Defense by Xinyuan Cao

Title: Foundations of Efficient Representation Learning

Date: March 17, 2026

Time: 12:00 pm - 2:00 pm ET

Location: Klaus 1212

Zoom link: https://gatech.zoom.us/j/94992205747

Xinyuan Cao

Machine Learning PhD Student

School of Computer Science
Georgia Institute of Technology

Committee

1 Dr. Santosh Vempala (Advisor), School of Computer Science, Georgia Institute of Technology

2 Dr. Jacob Abernethy, School of Computer Science, Georgia Institute of Technology

3 Dr. Pan Li, School of Electrical and Computer Engineering, Georgia Institute of Technology

4 Dr. Sahil Singla, School of Computer Science, Georgia Institute of Technology

5 Dr. Freda Shi, School of Computer Science, University of Waterloo

Abstract

Representation learning extracts lower-dimensional, structured features from complex, unstructured data and reuses them across tasks. Despite strong empirical performance, its theoretical foundations remain limited. This thesis bridges this gap by developing formal efficiency guarantees for representation learning.

First, we study unsupervised identification of geometric structure and give a polynomial-time algorithm that recovers a halfspace with margin from unlabeled data under broad distributional conditions. Next, we analyze implicit structure in sequence modeling. By formalizing long-range structure using efficient distinguishers, we prove that minimizing next-token prediction loss over bounded-size networks yields an indistinguishable language model, with model size polynomial in the distinguisher parameters and independent of document length. Finally, we study how learned structures can be efficiently transferred in lifelong learning, where tasks arrive sequentially and the model continually refines the representation while maintaining performance on earlier tasks. We propose algorithms that dynamically learn, refine, and reuse features across sequential tasks, achieving near-optimal sample complexity.

Together, these results provide a unified learning-theoretic foundation for efficient representation learning, spanning how structure can be identified, induced by training objectives, and transferred across tasks.

Media

No media selected

Summary

Foundations of Efficient Representation Learning

Details

Tuesday

Mar 17 2026

12:00pm - 02:00pm

Location: Klaus 1212

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow status: Published
Created by: Tatianna Richardson
Created: 03/09/2026
Modified By: Tatianna Richardson
Modified: 03/09/2026

Mercury (Hg)

PhD Defense by Xinyuan Cao

Log in

Georgia Institute of Technology

PhD Defense by Xinyuan Cao

Primary tabs

Log in

Georgia Institute of Technology