This is the code repository for Deep Reinforcement Learning Hands-On, Third Edition, published by Packt.
Maxim Lapan
Reward yourself and take this journey into RL with the third edition of Deep Reinforcement Learning Hands-On. The book takes you through the basics of RL to more advanced concepts with the help of various applications, including game playing, discrete optimization, stock trading, and web browser navigation. By walking you through landmark research papers in the field, this deep reinforcement learning book will equip you with the practical know-how of RL and the theoretical foundation to understand and implement most modern RL papers. The book retains its strengths by providing concise and easy-to-follow explanations. You’ll work through practical and diverse examples, from grid environments and games to stock trading and RL agents in web environments, to give you a well-rounded understanding of RL, its capabilities, and use cases. You’ll learn about key topics, such as deep Q-networks (DQNs), policy gradient methods, continuous control problems, and highly scalable, non-gradient methods. If you want to learn about RL using a practical approach using OpenAI Gym and PyTorch , concise explanations, and the incremental development of topics, then Deep Reinforcement Learning Hands-On, Third Edition is your ideal companion
- Stay on the cutting edge with new content on MuZero, RL with human feedback, and LLMs
- Evaluate RL methods, including cross-entropy, DQN, actor-critic, TRPO, PPO, DDPG, and D4PG
- Implement RL algorithms using PyTorch and modern RL libraries
- Build and train deep Q-networks to solve complex tasks in Atari environments
- Speed up RL models using algorithmic and engineering approaches
- Leverage advanced techniques like proximal policy optimization (PPO) for more stable training
- What Is Reinforcement Learning?
- OpenAI Gym
- Deep Learning with PyTorch
- The Cross-Entropy Method
- Tabular Learning and the Bellman Equation
- Deep Q-Networks
- Higher-Level RL Libraries
- DQN Extensions
- Ways to Speed up RL
- Stocks Trading Using RL
- Policy Gradients – an Alternative
- Actor-Critic Methods - A2C and A3C
- The TextWorld Environment
- Web Navigation
- Continuous Action Space
- Trust Regions – PPO, TRPO, ACKTR, and SAC
- Black-Box Optimization in RL
- Advanced Exploration
- RL with Human Feedback
- MuZero
- RL in Discrete Optimization
- Multi-agent RL
The examples in this book were implemented and tested using Python version 3.11. I assume that you’re already familiar with the language and common concepts such as virtual environments, so I won’t cover in detail how to install packages and how to do this in an isolated way. The examples will use the previously mentioned Python type annotations, which will allow us to provide type signatures for functions and class methods. Nowadays, there are lots of ML and RL libraries available, but in this book, I tried to keep the list of dependencies to a minimum, giving a preference to our own implementation of methods over the blind import of third-party libraries. The external libraries that we will use in this book are open source software, and they include the following:
- NumPy: This is a library for scientific computing and implementing matrix operations and common functions.
- OpenCV Python bindings: This is a computer vision library and provides many functions for image processing.
- Gymnasium from the Farama Foundation: (https://farama.org) This is a maintained fork of the OpenAI Gym library (https://github.com/openai/gym) and an RL framework that has various environments that can be communicated with in a unified way.
- PyTorch: This is a flexible and expressive deep learning (DL) library. A short crash course on it will be given in Chapter 3.
- PyTorch Ignite: This is a set of high-level tools on top of PyTorch used to reduce boilerplate code. It will be covered briefly in Chapter 3. The full documentation is available here: https://pytorch-ignite.ai/.
- PTAN: (https://github.com/Shmuma/ptan) This is an open-source extension to the OpenAI Gym API that I created to support modern deep RL methods and building blocks. All classes used will be described in detail together with the source code.
Other libraries will be used for specific chapters; for example, we will use Microsoft TextWorld to play textbased games, PyBullet and MuJoCo for robotic simulations, Selenium for browser-based automation problems, and so on. Those specialized chapters will include installation instructions for those libraries. A significant portion of this book (Parts 2, 3, and 4) is focused on the modern deep RL methods that have been developed over the past few years. The word “deep” in this context means that DL is heavily used. You may be aware that DL methods are computationally hungry. One modern graphics processing unit (GPU) can be 10 to 100 times faster than even the fastest multiple central processing unit (CPU) systems. In practice, this means that the same code that takes one hour to train on a system with a GPU could take from half a day to one week even on the fastest CPU system. It doesn’t mean that you can’t try the examples from this book without having access to a GPU, but it will take longer. To experiment with the code on your own (the most useful way to learn anything), it is better to get access to a machine with a GPU. This can be done in various ways:
- Buying a modern GPU suitable for CUDA and supported by the PyTorch framework
- Using cloud instances. Both Amazon Web Services and Google Cloud Platform can provide you with GPU-powered instances
- Google Colab offers free GPU access to its Jupyter notebooks To give you the exact versions of the external dependencies that we will use throughout the book, here is a requirements.txt file (please not that it was tested on Python 3.11; different versions might require you to tweak the dependencies or not work at all):
gymnasium[atari]==0.29.1
gymnasium[classic-control]==0.29.1
gymnasium[accept-rom-license]==0.29.1
moviepy==1.0.3
numpy<2
opencv-python==4.10.0.84
torch==2.5.0
torchvision==0.20.0
pytorch-ignite==0.5.1
tensorboard==2.18.0
mypy==1.8.0
ptan==0.8.1
stable-baselines3==2.3.2
torchrl==0.6.0
ray[tune]==2.37.0
pytest
Maxim Lapan Maxim has been working as a software developer for more than 20 years and was involved in various areas: distributed scientific computing, distributed systems and big data processing. Since 2014 he is actively using machine and deep learning to solve practical industrial tasks, such as NLP problems, RL for web crawling and web pages analysis. He has been living in Germany with his family.