ML@GT and ISyE invite you to a seminar by Warren B. Powell, professor of operations research and financial engineering at Princeton University.

For scheduling information, please contact Anton Kleywegt at anton.kleywegt@isye.gatech.edu

From Reinforcement Learning to Stochastic Optimization: A Universal Framework for Sequential Decision Analytics

Sequential decisions are an almost universal problem class, spanning dynamic resource allocation problems, control problems, discrete graph problems, active learning problems, as well as two-agent games and multiagent problems. Application settings span engineering, the sciences, transportation, health services, medical decision making, energy, e-commerce and finance. A rich problem class involves systems that must actively learn about the environment, possibly via drones or robots. In multi-agent systems, we may need to learn about the behavior of other agents.

These problems have been addressed in the academic literature using a variety of modeling and algorithmic frameworks, including dynamic programming, stochastic programming, stochastic control, simulation optimization, approximate dynamic programming/reinforcement learning, and even multiarmed bandit problems.

I will describe a universal modeling framework that can be used for *any* sequential decision problem in the presence of different sources of uncertainty. The framework is centered on an optimization problem that optimizes over policies (rules for making decisions), where we show that there are two fundamental strategies for designing policies (policy search and policies based on lookahead approximations), each of which further divide into two classes, creating four (meta)classes of policies that are the foundation of *any* solution approach that has ever been proposed for a sequential problem. I will demonstrate these policies in two broad contexts: pure learning problems (“bandit problems”) and dynamic resource allocation problems, where I will use a simple energy storage problem to show that each of the four classes (and a hybrid) can be made to work best.

Warren Powell is a faculty member in the Department of Operations Research and Financial Engineering at Princeton University where he has taught since 1981. In 1990, he founded CASTLE Laboratory which spans research in computational stochastic optimization with applications initially in transportation and logistics. In 2011, he founded the Princeton laboratory for ENergy Systems Analysis (PENSA) to tackle the rich array of problems in energy systems analysis. In 2013, this morphed into “CASTLE Labs,” focusing on computational stochastic optimization and learning.

In the 1980’s, he designed and wrote SYSNET, an interactive optimization model for load planning at Yellow Freight System, where it is still in use after 25 years. In 1988, he founded the Princeton Transportation Consulting Group which marketed the model as SuperSPIN, which was adopted by the entire less-than-truckload industry, stabilizing an industry where 80 percent of the companies went bankupt in the first post-deregulation decade. SuperSPIN was used in the planning of American Freightways (which became FedEx Freight), Roadway Package System (which became FedEx Ground), and Overnight Transportation (which became UPS Freight). SuperSPIN stabilized the LTL trucking industry in the 1990’s, following its deregulation in 1980.

Also in the 1980’s he developed a series of models for truckload trucking, starting with LoadMAP (written by Ken Nickerson ’84), which then evolved to an integrated stochastic model for driver assignment called MicroMAP (the senior thesis of David Cape ’87). As of 2011, MicroMAP was being used to dispatch over 66,000 drivers for 20 of the largest truckload carriers in the U.S.

He has started three consulting firms: Princeton Transportation Consulting Group (1988), Transport Dynamics (1995), and Optimal Dynamics (2016) (CEO is his son Daniel Powell), but he has continued to do his developmental work through CASTLE Laboratory at Princeton University, where he has worked with the leading companies in less-than-truckload trucking (Yellow Freight System/YRC), parcel shipping (United Parcel Service), truckload trucking (Schneider National), rail (primarily Norfolk Southern Railway), air (Netjets and Embraer), as well as the Air Mobility Command. As he moved into energy, he has worked with PJM Interconnections (the grid operator for the mid-Atlantic states), and PSE&G (the utility that serves 75 percent of New Jersey). Click here for a complete list.

Motivated by these applications, he developed a method for bridging dynamic programming with math programming to solve very high-dimensional stochastic, dynamic programs using the modeling and algorithmic framework of approximate dynamic programming. This work has been used in a variety of applications including fleet management at Schneider National (50,000 variables per time period, and a state variable with 10^{20} *dimensions*), the SMART energy resource planning model (175,000 time periods), and locomotive optimization at Norfolk Southern.

He identified four fundamental classes of policies for solving sequential decision problems, integrating fields such as stochastic programming, dynamic programming (including approximate dynamic programming/reinforcement learning), robust optimization, optimal control and stochastic search (to name a few). This work identified a new class of policy called a *parametric cost function approximation *(click here for more information).

His work in industry is balanced by contributions to the theory of stochastic optimization, and machine learning.

Prizes and awards – Recipient *Docteur Honoris Causa* from the University of Quebec in Montreal in 2013. Winner, Daniel Wagner Prize for extending approximate dynamic programming to very high-dimensional problems for Schneider National. Best Paper Prize from the Society for Transportation Science and Logistics (once for this problem, and once for our ADP model for locomotive management at Norfolk Southern). His students have won many awards (Dantzig Prize for best dissertation in Operations Research, several winners of the Transportation Science dissertation prize, Doing Good with Good OR Competition honorable mention, Nicholson Prize finalist). Finalist in the prestigious Edelman competition in 1987 and 1991. Informs Fellows Award, Presidential Young Investigator Award.

Books: He is the author of Approximate Dynamic Programming: Solving the curses of dimensionality and co-author (with Ilya Ryzhov) of Optimal Learning (both published by Wiley). Co-editor (with J. Si, A. Barto, and D. Wunsch) *Learning and Approximate Dynamic Programming: Scaling up to the Real World.*

Just the numbers: $50+ million in research funding (in 2020 dollars), 250+ refereed papers, two books (plus an edited volume), ~60 Ph.D. students and post-docs (~30 in academia and research laboratories), 10 Masters, 200+ undergraduate senior theses, h-number (on Google) of 65, 18,000+ citations, 36,000+ visitors per year to my websites, 7,000+ connections on LinkedIn (some miniscule number on Facebook)… (let me know if you can think of any more).

He has served in numerous leadership and service roles, including President of the Transportation Science Section, Informs board of directors, director of several NSF workshops, Area Editor for transportation at Operations Research (8 years), and numerous prize, review and service committees. In 1991 he co-founded the triennial conference TRISTAN, now the leading international conference for transportation systems analysis. In 2003 he designed the Informs Impact Prize and served as the first chair in 2004.

]]>khanson@cc.gatech.edu

]]>