Code for reproducing experiments in Model-Based Active Exploration, ICML 2019
Written in PyTorch v1.0.
Code relies on sacred for managing experiments and hyper-parameters.
envs/
: contains the environments used.main.py
: contains the main algorithm and baselines through modes.models.py
: a fast parallel implementation of an ensemble of models which can are trained with negative log-likelihood loss.utilities.py
: contains the all the utilities (exploration objectives) used in the paper.imagination.py
: contains code that constructs a virtual MDP using the model ensemble.sac.py
: contains a simple Soft Actor-Critic implementation.sacred_fetcher.py
: script to download experiment artifacts stored in MongoDB.
-
Install required dependencies:
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
-
Create conda environment with required dependencies:
conda env create -f conda_env.yml
-
Download and setup MuJoCo binaries. The project uses
mujoco
andmujoco_py
version 1.50.mkdir ~/.mujoco/ cd .mujoco/ wget -c https://www.roboti.us/download/mjpro150_linux.zip unzip mjpro150_linux.zip rm mjpro150_linux.zip
Obtain MuJoCo license key and place it
.mujoco/
directory created above with filenamemjkey.txt
. -
Append the following to
~/.bashrc
:# MuJoCo export LD_LIBRARY_PATH=:/home/<USER>/.mujoco/mjpro150/bin if [ -f /usr/lib/x86_64-linux-gnu/libGLEW.so ]; then export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/<USER>/.mujoco/mjpro150/bin:/usr/lib/nvidia-390 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-375 fi
-
Quick test of MuJoCo installation
>>> import gym >>> gym.make('HalfCheetah-v2')
Execute the commands listed below from the code directory to reproduce the results.
- MAX:
python main.py with max_explore env_noise_stdev=0.02
- Trajectory Variance Active Exploration:
python main.py with max_explore utility_measure=traj_stdev policy_explore_alpha=0.2 env_noise_stdev=0.02
- Renyi Divergence Reactive Exploration:
python main.py with max_explore exploration_mode=reactive env_noise_stdev=0.02
- Prediction Error Reactive Exploration:
python main.py with max_explore exploration_mode=reactive utility_measure=pred_err policy_explore_alpha=0.2 env_noise_stdev=0.02
- Random Exploration:
python main.py with random_explore env_noise_stdev=0.02
- MAX:
python main.py with max_explore env_name=MagellanAnt-v2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
- Trajectory Variance Active Exploration:
python main.py with max_explore env_name=MagellanAnt-v2 utility_measure=traj_stdev policy_explore_alpha=0.2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
- Renyi Divergence Reactive Exploration:
python main.py with max_explore env_name=MagellanAnt-v2 exploration_mode=reactive env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
- Prediction Error Reactive Exploration:
python main.py with max_explore env_name=MagellanAnt-v2 exploration_mode=reactive utility_measure=pred_err policy_explore_alpha=0.2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
- Random Exploration:
python main.py with random_explore env_name=MagellanAnt-v2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
Magellan is the internal code name of the project inspired by life of Ferdinand Magellan.