A Unified Framework on Sequential Decisions under Uncertainty

Georgia Tech, ISyE

Warren B Powell

Princeton University

April 3, 2020

Sequential decision problems arise in application areas that include engineering, the sciences, transportation, logistics, health services, medical decision making, energy, e-commerce and finance, along multiagent problems that arise with drones and robotics. These problems have been addressed in the academic literature using a variety of modeling and algorithmic frameworks, including dynamic programming, stochastic programming, optimal control, simulation optimization, approximate dynamic programming/reinforcement learning, and even multiarmed bandit problems.

In sharp contrast with the field of deterministic math programming which enjoys a canonical framework that is used around the world, sequential decision problems are described in the literature using at least eight fundamental modeling languages plus at least six more derivative dialects. Further, application communities have developed a wide range of solution methods that reflect the characteristics of specific problem classes.

In this workshop, I will introduce a single, canonical modeling framework that covers every sequential decision problem. The framework consists of five dimensions (drawing heavily on the approach used by stochastic control) that naturally represent real-world problems and map directly into software (and vice versa). The framework clearly lays out the elements of any sequential decision problem that have to be modeled, which helps to clarify the understanding of complex systems.

Special attention will be given to the modeling of state variables, which are a surprising source of confusion in the research literature. I will illustrate the different classes of state variables, including belief states, using a series of energy storage problems. Through proper handling of state variables, I show how we can model pure learning problems, pure resource allocation problems, hybrid learning/resource allocation problems, and contextual problems.

Our modeling strategy is centered on the challenge of optimizing over policies (functions for making decisions). I will describe four (meta) classes of policies, that can be divided into two broad groups: the policy search class (finding the best function that works well over time), and lookahead policies that approximate the downstream impact of making a decision now. I will claim that these four classes are universal: any method proposed in the literature (or used in practice) falls into one of these four classes, or a hybrid of two or more. I will illustrate parametric cost function approximations that are widely used in practice, but almost completely ignored by the academic community.

I will illustrate the four classes of policies in the context of pure learning problems, and the much richer class of state-dependent problems. In the process, I will highlight the strengths of policies in the policy search class (simplicity, ability to incorporate structure) and the weaknesses (tunable parameters). I will use an energy storage problem to show that we can make each of the four classes of policies (and possibly a hybrid) work best depending on the characteristics of the data.

]]>In sharp contrast with the field of deterministic math programming which enjoys a canonical framework that is used around the world, sequential decision problems are described in the literature using at least eight fundamental modeling languages plus at least six more derivative dialects. Further, application communities have developed a wide range of solution methods that reflect the characteristics of specific problem classes.

In this workshop, I will introduce a single, canonical modeling framework that covers every sequential decision problem. The framework consists of five dimensions (drawing heavily on the approach used by stochastic control) that naturally represent real-world problems and map directly into software (and vice versa). The framework clearly lays out the elements of any sequential decision problem that have to be modeled, which helps to clarify the understanding of complex systems.

Special attention will be given to the modeling of state variables, which are a surprising source of confusion in the research literature. I will illustrate the different classes of state variables, including belief states, using a series of energy storage problems. Through proper handling of state variables, I show how we can model pure learning problems, pure resource allocation problems, hybrid learning/resource allocation problems, and contextual problems.

Our modeling strategy is centered on the challenge of optimizing over policies (functions for making decisions). I will describe four (meta) classes of policies, that can be divided into two broad groups: the policy search class (finding the best function that works well over time), and lookahead policies that approximate the downstream impact of making a decision now. I will claim that these four classes are universal: any method proposed in the literature (or used in practice) falls into one of these four classes, or a hybrid of two or more. I will illustrate parametric cost function approximations that are widely used in practice, but almost completely ignored by the academic community.

I will illustrate the four classes of policies in the context of pure learning problems, and the much richer class of state-dependent problems. In the process, I will highlight the strengths of policies in the policy search class (simplicity, ability to incorporate structure) and the weaknesses (tunable parameters). I will use an energy storage problem to show that we can make each of the four classes of policies (and possibly a hybrid) work best depending on the characteristics of the data.

]]>