Rainbow

Implementation of the rainbow paper: Combining Improvements in Deep Reinforcement Learning. After the introduction of Deep Q-Networks in 2015, five other methods were introduced afterwards to improve the performance of initial DQN algorithm. These methods are:

Double Q-Learning
Dueling architecture
Prioritized experience replay
Distributional reinforcement learning
Noisy Nets

Rainbow combined all these methods also with multi-step learning, and showed the final combination does much better than all other separate methods alone.

Demo

Pong	Boxing

Results

x-axis: episode number.
environment: Pong

Running reward	Mean reawrd of the last ten episodes

the obvious learning phase has started from episode 1200 and the agent has reached to its best performance around episode 1600.

Environments tested

PongNoFrameskip-v4
BoxingNoFrameskip-v4
MsPacmanNoFrameskip-v4

Table of hyper-parameters

All values (except final_annealing_beta_steps that was chosen by trial and error and initial_mem_size_to_train that was chosen as a result of lack of computational resources) are based on the Rainbow paper, And instead of hard updates, the technique of soft updates of the DDPG paper was applied.

Parameters	Value
lr	6.25e-5
n_step	3
batch_size	32
gamma	0.99
tau(based on DDPG paper)	0.001
train_period(number of steps between each optimization)	4
v_min	-10
v_max	10
n_atoms	51
adam epsilon	1.5e-4
alpha	0.5
beta	0.4
clip_grad_norm	10

Structure

├── Brain
│   ├── agent.py
│   └── model.py
├── Common
│   ├── config.py
│   ├── logger.py
│   ├── play.py
│   └── utils.py
├── main.py
├── Memory
│   ├── replay_memory.py
│   └── segment_tree.py
├── README.md
├── requirements.txt
└── Results
    ├── 10_last_mean_reward.png
    ├── rainbow.gif
    └── running_reward.png

Brain dir consists the neural network structure and the agent decision making core.
Common consists minor codes that are common for most RL codes and do auxiliary tasks like: logging, wrapping Atari environments and ... .
main.py is the core module of the code that manges all other parts and make the agent interact with the environment.
Memory consists memory of the agent with prioritized experience replay extension.

Dependencies

gym == 0.17.2
numpy == 1.19.1
opencv_contrib_python == 3.4.0.12
psutil == 5.4.2
torch == 1.4.0

Installation

pip3 install -r requirements.txt

Usage

How to run

main.py [-h] [--algo ALGO] [--mem_size MEM_SIZE] [--env_name ENV_NAME]
               [--interval INTERVAL] [--do_train] [--train_from_scratch]
               [--do_intro_env]

Variable parameters based on the configuration of the machine or user's choice

optional arguments:
  -h, --help            show this help message and exit
  --algo ALGO           The algorithm which is used to train the agent.
  --mem_size MEM_SIZE   The memory size.
  --env_name ENV_NAME   Name of the environment.
  --interval INTERVAL   The interval specifies how often different parameters
                        should be saved and printed, counted by episodes.
  --do_train            The flag determines whether to train the agent or play
                        with it.
  --train_from_scratch  The flag determines whether to train from scratch or[default=True]
                        continue previous tries.
  --do_intro_env        Only introduce the environment then close the program.

In order to train the agent with default arguments , execute the following command and use --do_train flag to train the agent (You may change the memory capacity and the environment based on your desire.):

python3 main.py --do_train --algo="rainbow" --mem_size=150000 --env_name="BreakoutNoFrameskip-v4" --interval=100 --train_from_scratch

If you want to keep training your previous run, execute the follwoing:

python3 main.py --do_train --algo="rainbow" --mem_size=150000 --env_name="PongNoFrameskip-v4" --interval=100

Hardware requirements

The whole training procedure was done on Google Colab and it took less than 15 hours of training, thus a machine with a similar configuration would be sufficient, but if you need a more powerful free online GPU provider, take a look at paperspace.com.

alirezakazemipour/Rainbow