License: Apache 2.0
Rainbow-IQN Ape-X is a new distributed state-of-the-art algorithm on Atari coming
from the combination of the 3 following papers:
Rainbow: Combining Improvements in Deep Reinforcement Learning [1].
IQN: Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning [2].
Ape-X: Distributed Prioritized Experience Replay [3].
This repository is an open-source implementation of a distributed version of Rainbow-IQN
following Ape-X paper for the distributed part (there is also a distributed version
of Rainbow only, i.e. Rainbow Ape-X).
The code presented here is at the basis of our paper Is Deep Reinforcement
Learning really superhuman on Atari [4] on which
we introduce SABER: a Standardized Atari BEnchmark for general
Reinforcement learning algorithms.
Importantly this code was the Reinforcement Learning part of the algorithm
I developed to win the
CARLA challenge on Track 2 Cameras Only. This success showed the
strength of Rainbow-IQN Ape-X as a general algorithm.
- Python 3.5+
- Pytorch >= 0.4.1
- CUDA 9.0 or higher
- redis (link to install for Ubuntu 16)
To install all dependencies with Anaconda run $ conda env create -f environment.yml
.
If no Anaconda, install pytorch and then install the following packages
with pip: atari-py, redlock-py, plotly, opencv-python.
You can take a look at the Dockerfile if you are uncertain about steps to install this project.
Afterwards, you can install the package with:
$ pip install --editable ./rainbow-iqn-apex
You will be able to use functions and classes from this project into other projects, if you make changes to the sources files, those changes will be immediately seen next time you restart the python interpreter (or reload the package with importlib):
import rainbowiqn
Uninstall it with:
$ pip uninstall rainbow-iqn-apex
This code has been tested on Ubuntu 16 and 18.
Open 3 terminal to sanity check if every thing is working (this will launch an experiment with one actor on space_invaders):
# Terminal 1. This launchs the redis servor on port 6379.
$ redis-server redis_rainbow_6379.conf
# Terminal 2. This launchs the learner.
$ python rainbowiqn/launch_learner.py --memory-capacity 100000 \
--learn-start 8000 \
--log-interval 2500
# Terminal 3. This launchs the actor.
$ python rainbowiqn/launch_actor.py --id-actor 0 \
--memory-capacity 100000 \
--learn-start 8000 \
--log-interval 2500
If after a short time (1 minute probably), you see some logs like the following one appearing in the learner and the actor terminal, everything is OK!
[2019-08-12T17:40:11] T = 12500 / 50000000
Time between 2 log_interval for learner (14.410 sec) # (for the learner)
[2019-08-12T17:40:06] T = 12500 / 50000000
Time between 2 log_interval for actor 0 (13.249 sec) # (for the actor)
Kill all 3 terminals after and see the wiki to know how to launch experiments for real!
To test a pretrained snapshot, you must download trained weight from the release and then prompt the following command:
# Remove rendering for faster evaluation
$ python rainbowiqn/test_multiple_seed.py --model with_weight/Rainbow_IQN/space_invaders/last_model_space_invaders_50000000.pth \
--game space_invaders --render
By default all experiments will be made on SABER. This includes all recommendations of
Machado et al. [5] (i.e. ignore life signal, using sticky actions, always use 18 action set, report
results as the mean score over 100 consecutive training episodes) and a new parameter which we call max_stuck_time
(5 minutes by default).
This parameter allows to set infinite episode and still terminate episode when agent is stuck. More details can be
found in our paper Is Deep Reinforcement
Learning really superhuman on Atari [4].
In our paper we discuss how setting infinite episode is really important to allow for fair and comparable results.
Moreover this allows to compare against the human world record and shows the incredibly high
gap remaining before claiming of superhuman performances.
We showed that the use of superhuman performances
in previous papers is indeed misleading. General RL agents are definitely far from superhuman on most Atari games!
- This codebase is heavily borrowed from @kaixhin for Rainbow (see Kaixhin license there )
- @dopamine for the Tensorflow implementation
of IQN (see
compute_loss_iqn.py
for Dopamine license)
[1] Rainbow: Combining Improvements in Deep Reinforcement Learning
[2] Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning
[3] Distributed Prioritized Experience Replay
[4] Is Deep Reinforcement Learning really superhuman on Atari?
[5] Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents