Q-Learning Notebook - Play the N-Chain Environment with three Agents
This repository contains a Jupyter Notebook with an implementation of a Q-Learning
agent, which learns to solve the n-Chain OpenAI Gym
environment
This notebook is inspired by the following notebook: Deep Reinforcement Learning Course Notebook
Q-Learning
The notebook contains a Q-Learning
algorithm implementation and a training loop to solve the n-Chain OpenAI Gym environment. The Q-Learning
algorithm is an off-policy temporal-difference control algorithm [1]:
Image taken from Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second edition, 2014/2015, page 158
The Q-Learning Agents
In this notebook we let different q-learning agents play the N-Chain evironment and see how they perform in the game. The following agents are implemented:
- 🤓 Smart Agent 1: the agent explores and takes future rewards into account
- 🤑 Greedy Agent 2: the agent cares only about immediate rewards (small gamma)
- 😳 Shy Agent 3: the agent doesn't explore the environment (small epsilon)
The n-Chain Environment
The n-Chain environment is taken from the OpenAI Gym
module. Documentation:
The image below shows an example of a 5-Chain (n = 5) environment with 5 states. a
stands for action and r
for the reward (Image Source).
States
This environment contains a chain with n positions, and every chain position corresponds to a possible state the agent can be in:
state | description |
---|---|
n (default n=5) | n-th postion on the chain |
Actions and Rewards
The agent can move along the chain using two actions for which the agent will get a different rewards:
action | reward | description |
---|---|---|
0 | get no reward | move forward along the chain (state = n+1) |
1 | get a small reward of 2 | jump back to state 0 |
The end of the chain presents a large reward of 10, and while standing at the end of the chain and still moving forward (action 0), the large reward can be gained repeatedly.
Additional Resources About Reinforcement Learning
- OpenAI Gym: Gym is a toolkit for developing and comparing reinforcement learning algorithms from OpenAI
- OpenAI Baselines: OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms
- Spining Up AI: This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning
- A Long Peek into Reinforcement Learning: Great blog post from Lilian Weng, where she is briefly going over the field of Reinforcement Learning (RL), from fundamental concepts to classic algorithms
- Policy Gradient Algorithms: Another great blog post from Lilian Weng, where she writes about policy gradient algorithms