Visiting Speaker Seminar
TITLE: Optimization and Learning for sequential decision making under uncertainty
SPEAKER: Dr. Shipra Agrawal
In this talk, I will present techniques that combine optimization and learning for decision making in complex, uncertain, online environments. Much of this work is motivated by challenges faced in modern revenue management problems, namely, a) unknown or difficult to estimate demand distributions b) multiple complex nonlinear constraints and objectives c) the need for fast large-scale algorithms, and d) personalized data-driven decisions. Formulating these problem aspects into an "online stochastic convex programming" framework, we devise fast algorithms that combine primal-dual paradigm with online learning to achieve provably optimal performance bounds. When applied to the special case of online packing, our ideas yield simpler and faster algorithms with optimal competitive ratio for this widely studied problem. An application area of our focus is internet advertising, where millions of ads need to be served every second to ensure long term revenues, without knowing the future demand patterns. Our algorithms have been implemented and are in use for yield management in display advertising.
I will further discuss a "bandit" property present in many revenue management problems, where the uncertain demand at each point in time is determined by the decision, and can only be observed "after" taking the decision. For example, in pay-per-click revenue model in internet advertising, a click (or no click) on an ad can be observed only after the ad is selected and displayed in response to the user query. Similar situation occurs in many other applications including posted-price based revenue management, worker-task allocation problem in crowdsourcing, machine scheduling, sensor network management etc. Modeling this problem as a combination of the classic multi-armed bandit problem and online stochastic convex programming, we design algorithms that balance the inherent exploration-exploitation (i.e., learning vs. optimization) tradeoff to achieve asymptotically optimal decision policies. Our results significantly improve upon several known results in online linear programming, blind network revenue management and multi-armed bandits literature, and reveal many interesting connections between convex optimization algorithms, Fenchel duality, multi-armed bandits and online learning.
This talk is based on joint work with Nikhil R. Devanur and Navin Goyal.
Shipra Agrawal is a researcher at Microsoft Research, India. She received a PhD in Computer Science from Stanford University in 2011, under the direction of Professor Yinyu Ye, Department of Management Science and Engineering. Her research spans several areas of optimization and machine learning, including data-driven optimization under partial, uncertain, and online inputs, and related concepts in learning, namely multi-armed bandits, online learning, and reinforcement learning. She is also interested in prediction markets and game theory. Application areas of her interest include, but are not limited to, internet advertising, recommendation systems, revenue management and resource allocation problems.