5 Reinforcement learning

Nazeer Shaik; Chandra Sekaran; Amit Mahajan; Balkeshwar Singh

Abstract

The field of reinforcement learning (RL) is introduced in this chapter, which also looks at several RL techniques. The main goal of RL is to provide algorithms that let agents discover the best policies through interactions with their surroundings while maximizing cumulative rewards. In the first part of the chapter, Markov decision processes (MDPs), which provide a mathematical foundation for modeling RL problems, are discussed. We look at value iteration and policy iteration as iterative approaches to addressing MDPs. To help you find the ideal action-value function, we present Q-Learning, an off-policy model-free RL algorithm. Deep Q-networks (DQNs), which combine Q-learning with deep neural networks, are also addressed in order to handle high-dimensional state spaces. Policy gradient methods are presented as an alternative approach that directly optimizes policy parameters using gradient ascent. Proximal policy optimization (PPO), a leading policy gradient algorithm, is discussed for its ability to balance stability and policy performance. The chapter concludes by emphasizing the significance of RL methods in training agents to make sequential decisions in complex environments across various domains.

5 Reinforcement learning

Abstract

Abstract

Chapters in this book

Chapters in this book

5 Reinforcement learning

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book