{"680200":{"#nid":"680200","#data":{"type":"event","title":"PhD Proposal by Xinyuan Cao","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle\u003C\/strong\u003E: Foundations of Efficient Representation Learning\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EDate\u003C\/strong\u003E: Tuesday, February 11th\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ETime\u003C\/strong\u003E: 1:30 PM - 3:00 PM EST\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ELocation\u003C\/strong\u003E: (Hybrid) Klaus 3100; Zoom link: \u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/94019785975\u0022 target=\u0022_blank\u0022 title=\u0022https:\/\/gatech.zoom.us\/j\/94019785975\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/94019785975\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EXinyuan Cao\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EMachine Learning Ph.D. Student\u003C\/p\u003E\u003Cp\u003ESchool of Computer Science\u003C\/p\u003E\u003Cp\u003EGeorgia Institute of Technology\u003Cbr\u003E\u003Cbr\u003E\u003Cstrong\u003ECommittee\u003C\/strong\u003E:\u003Cbr\u003E\u2022 Dr. Santosh Vempala (Advisor) | School of Computer Science, Georgia Institute of Technology\u003C\/p\u003E\u003Cp\u003E\u2022 Dr. Jacob Abernethy\u0026nbsp;| School of Computer Science, Georgia Institute of Technology\u003Cbr\u003E\u2022 Dr. Pan Li | School of Electrical and Computer Engineering, Georgia Institute of Technology\u003Cbr\u003E\u2022 Dr. Sahil Singla\u0026nbsp;| School of Computer Science, Georgia Institute of Technology\u003Cbr\u003E\u003Cbr\u003E\u003Cstrong\u003EAbstract\u003C\/strong\u003E:\u003C\/p\u003E\u003Cp\u003ERepresentation learning refers to a set of machine learning methods that first extract lower-dimensional features from complex, unstructured data and then use the learned features for a variety of downstream tasks. Despite its empirical success across main domains, a rigorous theoretical foundation for representation learning remains underdeveloped. This thesis aims to bridge this gap by developing theoretical guarantees to better understand representation learning and designing practical algorithms based on theoretical foundations.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EThe first part of this proposal focuses on provable algorithms for feature learning. In supervised settings, I analyze the phenomenon of neural collapse and establish conditions under which it emerges in trained neural networks. In unsupervised learning, I present the first polynomial-time algorithm for learning halfspaces with margins from unlabeled data. My ongoing work explores explainable clustering, where I study the trade-off between interpretability and clustering performance.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn the second part, I investigate the theory of transfer learning, particularly in the lifelong learning setting, where a model sequentially acquires and transfers knowledge from past tasks to future ones. I propose an algorithm with nearly tight sample complexity for this setting, improving on work from a decade ago, and extend it to heuristic algorithms that demonstrate strong empirical performance on real-world datasets.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EThe final part of the proposal explores the practical applications of representation learning. To address the scalability limitation in graph contrastive learning (GCL), I propose a simple yet effective GCL framework based on a sparse low-rank approximation on the diffusion matrix. This method significantly improves efficiency while maintaining competitive performance and can be extended to dynamic graph settings in future work. Additionally, my ongoing work investigates the benefits of next-token prediction in large language models.\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EFoundations of Efficient Representation Learning\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Foundations of Efficient Representation Learning"}],"uid":"27707","created_gmt":"2025-02-04 21:30:08","changed_gmt":"2025-02-04 21:30:08","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2025-02-11T13:30:00-05:00","event_time_end":"2025-02-11T16:00:00-05:00","event_time_end_last":"2025-02-11T16:00:00-05:00","gmt_time_start":"2025-02-11 18:30:00","gmt_time_end":"2025-02-11 21:00:00","gmt_time_end_last":"2025-02-11 21:00:00","rrule":null,"timezone":"America\/New_York"},"location":" (Hybrid) Klaus 3100; Zoom link: ","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"102851","name":"Phd proposal"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}