event

PhD Defense by Guan-Horng Liu

Primary tabs

Title: Large-Scale Optimization for Deep Neural Network Architecture: A Dynamical System Theory Perspective

 

Date: Wednesday, June 26th, 2024

Time: 1:00 - 3:00 pm EST (6-8pm London Time)

Location/Remote link: Coda C0915 Atlantic https://gatech.zoom.us/j/3392051118?omn=97259773696

 

Guan-Horng Liu

Machine Learning PhD Student

School of Aerospace Engineering

Georgia Institute of Technology

 

Committee

1. Dr. Evangelos Theodorou (School of Aerospace Engineering, Georgia Tech; Advisor)

2. Dr. Molei Tao (School of Mathematics, Georgia Tech)

3. Dr. Yao Xie (School of Industrial and Systems Engineering, Georgia Tech)

4. Dr. Justin Romberg (School of Electrical and Computer Engineering, Georgia Tech)

5. Dr. Arnaud Doucet (Department of Statistics, University of Oxford; Google DeepMind)

 

Abstract

Optimization of deep neural networks (DNNs) has been a driving force in the advancement of modern artificial intelligence. Despite efforts to design DNN architectures that leverage domain-specific knowledge, the development of optimization algorithms has often progressed independently of architectural innovations. This thesis delves into large-scale optimization methods that leverage the underlying deep architectural structures being optimized. Specifically, we demonstrate that the dynamical system and optimal control theory pave a profound foundation for algorithmic characterization in learning various deep architectures, including standard DNNs, Neural ODEs, and SDEs such as diffusion models/bridges.

Optimal control, in its broadest sense, examines the principle of optimization over dynamical systems. This methodological perspective naturally arises in training neural differential equations and can be applied to standard DNNs, with Backpropagation emerging as an approximate dynamic programming. Through development, we emphasize the significance of control-theoretic components such as differential programming and nonlinear Feynman-Kac, unifying existing optimization methods and extending them to handle a broader class of complex dynamics and problem setups that may otherwise be hard to adapt or foresee. The developed methods have been applied to large-scale applications such as image generation, restoration, translation, as well as solving mean-field games and opinion modeling.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:06/24/2024
  • Modified By:Tatianna Richardson
  • Modified:06/24/2024

Categories

Keywords

Target Audience