Reinforcement learning: a comparison of UCB versus alternative adaptive policies
-
Wesley Cowan
, Michael N. Katehakis and Daniel Pirutinsky
Abstract
In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).
Abstract
In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).
Chapters in this book
- Frontmatter I
- Preface V
- Contents XI
- Geometric analysis in the development of shocks in compressible fluids 1
- Ancient solutions to geometric flows 29
- Boundary value problems, medical imaging and the asymptotics of Riemann’s zeta function 51
- A short glimpse of the giant footprint of Fourier analysis and recent multilinear advances 79
- Fractional calculus and numerical methods for fractional PDEs 91
- Reinforcement learning: a comparison of UCB versus alternative adaptive policies 127
- Mathematics of computational modelling: some challenges of computing nonlinear phenomena 139
- Sharp estimates for dyadic-type maximal operators and stability 167
- Data structures for robust multifrequency imaging 181
- Theta and eta polynomials in geometry, Lie theory, and combinatorics 231
Chapters in this book
- Frontmatter I
- Preface V
- Contents XI
- Geometric analysis in the development of shocks in compressible fluids 1
- Ancient solutions to geometric flows 29
- Boundary value problems, medical imaging and the asymptotics of Riemann’s zeta function 51
- A short glimpse of the giant footprint of Fourier analysis and recent multilinear advances 79
- Fractional calculus and numerical methods for fractional PDEs 91
- Reinforcement learning: a comparison of UCB versus alternative adaptive policies 127
- Mathematics of computational modelling: some challenges of computing nonlinear phenomena 139
- Sharp estimates for dyadic-type maximal operators and stability 167
- Data structures for robust multifrequency imaging 181
- Theta and eta polynomials in geometry, Lie theory, and combinatorics 231