ISyE Department Seminar- Jianqing Fan

Event Details
  • Date/Time:
    • Friday October 22, 2021
      11:00 am - 12:00 pm
  • Location: ISyE Building- Groseclose 119
  • Phone:
  • URL: ISye Building
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Understanding Deep Q-Learning

Full Summary: Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm from both algorithmic and statistical perspectives. Specifically, we focus on a  slight simplification of DQN that fully captures its key features. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence  obtained by DQN. In particular, the statistical error characterizes the bias and variance that arise from approximating the action-value function using deep neural network, while the algorithmic error converges to zero at a geometric rate. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN. Furthermore, as a simple extension of  DQN, we   propose the Minimax-DQN algorithm for zero-sum Markov game with two players.  Borrowing the analysis of DQN, we also quantify the difference between  the   policies   obtained by Minimax-DQN  and  the Nash equilibrium of the Markov game     in terms of both the algorithmic and statistical rates of convergence.

Title: Understanding Deep Q-learning

Abstract: Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm from both algorithmic and statistical perspectives. Specifically, we focus on a slight simplification of DQN that fully captures its key features. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence obtained by DQN. In particular, the statistical error characterizes the bias and variance that arise from approximating the action-value function using a deep neural network, while the algorithmic error converges to zero at a geometric rate. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN. Furthermore, as a simple extension of DQN, we propose the Minimax-DQN algorithm for zero-sum Markov game with two players. Borrowing the analysis of DQN, we also quantify the difference between the policies obtained by Minimax-DQN and the Nash equilibrium of the Markov game in terms of both the algorithmic and statistical rates of convergence.

 

Bio: Jianqing Fan is a statistician, financial econometrician, and data scientist. He is Frederick L. Moore '18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at the Princeton University where he chaired the department from 2012 to 2015. He is the winner of The 2000 COPSS Presidents' Award, Morningside Gold Medal for Applied Mathematics (2007), Guggenheim Fellow (2009), Pao-Lu Hsu Prize (2013) and Guy Medal in Silver (2014). He got elected to Academician from Academia Sinica in 2012.

Additional Information

In Campus Calendar
Yes
Groups

School of Industrial and Systems Engineering (ISYE)

Invited Audience
Faculty/Staff, Postdoc, Public, Graduate students, Undergraduate students
Categories
Seminar/Lecture/Colloquium
Keywords
No keywords were submitted.
Status
  • Created By: sbryantturner3
  • Workflow Status: Published
  • Created On: Aug 30, 2021 - 3:50pm
  • Last Updated: Oct 18, 2021 - 10:26am