Home Mathematics Reinforcement learning: a comparison of UCB versus alternative adaptive policies
Chapter
Licensed
Unlicensed Requires Authentication

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

  • Wesley Cowan , Michael N. Katehakis and Daniel Pirutinsky
Become an author with De Gruyter Brill
First Congress of Greek Mathematicians
This chapter is in the book First Congress of Greek Mathematicians

Abstract

In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).

Abstract

In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).

Downloaded on 2.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/9783110663075-006/html?lang=en
Scroll to top button