/curious-rl

Curiosity-driven Exploration by Self-supervised Prediction

Primary LanguagePythonMIT LicenseMIT

Curious RL

Re-implementation of the 2017 paper "Curiosity-driven Exploration by Self-supervised Prediction" by Deepak Pathak et al. (arXiv link). This was a class project for the course EE556 at University of Southern California.

Original imlementation by the author can be found here: https://github.com/pathak22/noreward-rl

In this implementation the ICM module is added to the A2C algorithm, instead of A3C in the original paper. Here curious agent solves environment "VizdoomMyWayHome-v0":

In under 100M frames the curious agent was able to solve "VizdoomMyWayHomeVerySparse-v0", which vanilla A2C fails to solve even after training on 500M frames.

If the hyperparameters are not right the curious agent can get stuck in funny local optima, e.g. in this case the learning rate of the predictive model was too low, and the agent was forever curious about this wall with the bright texture.

Installation

This repository uses pipenv, a tool that manages both virtualenvs and Python dependencies. Install it if you don't have it:

pip install pipenv

clone the repo and create a virtualenv with all the packages, activate the env:

git clone https://github.com/alex-petrenko/curious-rl.git
cd curious-rl
pipenv install
pipenv shell

run tests:

python -m unittest

Experiments

Train curious agents in different environments. Use tensorboard to monitor the training process, stop when necessary:

python -m algorithms.curious_a2c.train_curious_a2c --env=doom_basic
python -m algorithms.curious_a2c.train_curious_a2c --env=doom_maze
python -m algorithms.curious_a2c.train_curious_a2c --env=doom_maze_sparse
python -m algorithms.curious_a2c.train_curious_a2c --env=doom_maze_very_sparse

tensorboard --logdir ./train_dir

The latest model will be saved periodically. After training to desired performance you can examine agent's behavior:

python -m algorithms.curious_a2c.enjoy_curious_a2c --env=doom_maze_sparse

Sometimes if you Ctrl+C some of the Doom processes will not exit, so you have to use some command to kill them.

kill -9 $(ps aux | grep 'train_curious_a2c' | awk '{print $2}')

It may take a long time to train the agent on mazes, be patient. The detailed PDF project report is available.

If you have any questions about this repo please feel free to reach me: apetrenko1991@gmail.com