Startseite Mathematik Reinforcement learning: a comparison of UCB versus alternative adaptive policies
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

  • Wesley Cowan , Michael N. Katehakis und Daniel Pirutinsky
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).

Abstract

In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).

Heruntergeladen am 2.11.2025 von https://www.degruyterbrill.com/document/doi/10.1515/9783110663075-006/html?lang=de
Button zum nach oben scrollen