Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Wesley Cowan; Michael N. Katehakis; Daniel Pirutinsky

Kapitel

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Wesley Cowan , Michael N. Katehakis und Daniel Pirutinsky

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Erkunden Sie dieses Fachgebiet So veröffentlichen Sie bei uns

Ein Kapitel aus dem Buch First Congress of Greek Mathematicians

Abstract

In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Abstract

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Kapitel in diesem Buch

https://doi.org/10.1515/9783110663075-006

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Abstract

Kapitel PDF Ansicht

Abstract

Kapitel in diesem Buch

Kapitel in diesem Buch

Kapitel in diesem Buch