Splitting Randomized Stationary Policies in Markov Decision Processes
TITLE: Splitting Randomized Stationary Policies in Markov Decision Processes
SPEAKER: Eugene Feinberg
This talk discusses whether an occupation measure for a given randomized stationary policy for a Markov Decision Process (MDPs) can be represented as a convex combination of occupation measures for simpler policies. In particular, the question is whether an occupation measure for a randomized stationary policy can be represented as a convex integral combination of occupation measures for deterministic policies. If this is possible, we say that the policy can be split.
We present general results on such representations and focus our attention on two particular situations: (i) the given subset of states is a singleton (splitting at a state), and (ii) the given subset is finite and the policy uses a finite number of actions at this set (finite splitting). For splitting at a state, we present new formulas and provide necessary and sufficient conditions when a policy can be split. For finite splitting, we describe the structure of the split and provide an algorithm for its computation. Important properties of the structure of the finite splitting are that each two consecutive policies in the split differ only at one state and the number of summands in the convex combination is relatively small.
We apply finite splitting to constrained discounted MDPs with countable state spaces and prove that an optimal mixed stationary policy can be selected in such a way that each two consecutive stationary policies in the split differ at one state. Though this talk discusses infinite state and action MDPs, the results on finite splitting are new for discounted MDPs with finite state and action sets. We shall also discuss splitting of state-action frequencies in unichain MDPs with average rewards per unit time.
This talk is based in part on a joint paper with Eric V. Denardo and Uriel G. Rothblum.