event
PhD Defense by Guan-Horng Liu
Primary tabs
Title: Large-Scale Optimization for Deep Neural Network Architecture: A Dynamical System Theory Perspective
Date: Wednesday, June 26th, 2024
Time: 1:00 - 3:00 pm EST (6-8pm London Time)
Location/Remote link: Coda C0915 Atlantic https://gatech.zoom.us/j/3392051118?omn=97259773696
Guan-Horng Liu
Machine Learning PhD Student
School of Aerospace Engineering
Georgia Institute of Technology
Committee
1. Dr. Evangelos Theodorou (School of Aerospace Engineering, Georgia Tech; Advisor)
2. Dr. Molei Tao (School of Mathematics, Georgia Tech)
3. Dr. Yao Xie (School of Industrial and Systems Engineering, Georgia Tech)
4. Dr. Justin Romberg (School of Electrical and Computer Engineering, Georgia Tech)
5. Dr. Arnaud Doucet (Department of Statistics, University of Oxford; Google DeepMind)
Abstract
Optimization of deep neural networks (DNNs) has been a driving force in the advancement of modern artificial intelligence. Despite efforts to design DNN architectures that leverage domain-specific knowledge, the development of optimization algorithms has often progressed independently of architectural innovations. This thesis delves into large-scale optimization methods that leverage the underlying deep architectural structures being optimized. Specifically, we demonstrate that the dynamical system and optimal control theory pave a profound foundation for algorithmic characterization in learning various deep architectures, including standard DNNs, Neural ODEs, and SDEs such as diffusion models/bridges.
Optimal control, in its broadest sense, examines the principle of optimization over dynamical systems. This methodological perspective naturally arises in training neural differential equations and can be applied to standard DNNs, with Backpropagation emerging as an approximate dynamic programming. Through development, we emphasize the significance of control-theoretic components such as differential programming and nonlinear Feynman-Kac, unifying existing optimization methods and extending them to handle a broader class of complex dynamics and problem setups that may otherwise be hard to adapt or foresee. The developed methods have been applied to large-scale applications such as image generation, restoration, translation, as well as solving mean-field games and opinion modeling.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:06/24/2024
- Modified By:Tatianna Richardson
- Modified:06/24/2024
Categories
Keywords
Target Audience