Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Wesley Cowan; Michael N. Katehakis; Daniel Pirutinsky

Chapter

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Wesley Cowan , Michael N. Katehakis and Daniel Pirutinsky

Published by

Become an author with De Gruyter Brill

Explore this Subject How to publish with us

This chapter is in the book First Congress of Greek Mathematicians

Abstract

In this paper, we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area, and we compare the performance of the classic UCB policy of Burnetas and Katehakis [10] with a new policy developed herein that we call MDPDeterministic Minimum Empirical Divergence (MDP-DMED) and a method based on Posterior sampling (MDP-PS).

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

https://doi.org/10.1515/9783110663075-006

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book