This is an implementation for Learning with AMIGo: Adversarially Motivated Intrinsic GOals.
The method described in the AMIGo paper listed below is implemented in monobeast/minigrid/monobeast_amigo.py of this repository. Please consult that file for details of the teacher and student policies, the losses used to train them, and other aspects of training.
The student policy is created in class MinigridNet
. The teacher policy is created in class Generator
. The training loop is defined in train()
and is divided into act()
which collects the batches generated by the actors, and learn()
which updates the learner based on vtrace
. Training is based on the TorchBeast implementation of IMPALA (Monobeast version).
If you have any questions or feel the code needs further clarification in the form of comments, please do not hesitate to raise an issue.
If you use AMIGo in your research and found it helpful, or are comparing against our results, please consider citing the following paper:
@article{campero2020learning,
title={Learning with AMIGo: Adversarially Motivated Intrinsic Goals},
author={Campero, Andres and Raileanu, Roberta and K{\"u}ttler, Heinrich and Tenenbaum, Joshua B and Rockt{\"a}schel, Tim and Grefenstette, Edward},
journal={arXiv preprint arXiv:2006.12122},
year={2020}
}
# create a new conda environment
conda create -n amigo python=3.7
conda activate amigo
# install dependencies
git clone git@github.com:facebookresearch/adversarially-motivated-intrinsic-goals.git
cd adversarially-motivated-intrinsic-goals
pip install -r requirements.txt
# Run AMIGo on MiniGrid Environment
OMP_NUM_THREADS=1 python -m monobeast.minigrid.monobeast_amigo --env MiniGrid-ObstructedMaze-1Q-v0 \
--num_actors 40 --modify --generator_batch_size 150 --generator_entropy_cost .05 \
--generator_reward_negative -.3 --disable_checkpoint --modify \
--savedir ./experimentMinigrid
We used an open sourced implementation of the exploration baselines (i.e. RIDE, RND, ICM, and Count). This code should be pulled in a separate local repository and run within a separate environment.
# create a new conda environment
conda create -n ride python=3.7
conda activate ride
# install dependencies
git clone git@github.com:facebookresearch/impact-driven-exploration.git
cd impact-driven-exploration
pip install -r requirements.txt
To reproduce the baseline results in the paper, run:
OMP_NUM_THREADS=1 python -m python main.py --env MiniGrid-ObstructedMaze-1Q-v0 \
--intrinsic_reward_coef 0.01 --entropy_cost 0.0001
with the corresponding best values for the --intrinsic_reward_coef
and --entropy_cost
reported in the paper for each model.
Set --model
to ride
, rnd
, curiosity
, or count
for RIDE, RND, ICM, or Count, respectively.
Set --use_fullobs_policy
for using a full view of the environment as input to the policy network.
Set --use_fullobs_intrinsic
for using full views of the environment to compute the intrinsic reward.
The default uses a partial view of the environment for both the policy and the intrinsic reward.
The code in this repository is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC 4.0).