Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments
This is the reference implementation of the Continuous Environment Meta-Reinforcement Learning (CEMRL)
algorithm.
The implementation is based on rlkit and PEARL.
For our experiments we use MuJoCo200, however due to old PEARL dependencies, older versions have to be installed, too:
- Get a MuJoCo license key. Follow the instructions.
- Put your key file in
~/.mujoco
. - Download the versions mujoco131, mujoco150, mujoco200 and put them in
~/.mujoco
. - Set
LD_LIBRARY_PATH
to point to all the MuJoCo binaries (~/.mujoco/mujoco200/bin
, mujoco150, mujoco200 respectively). - Set
LD_LIBRARY_PATH
to point to your gpu drivers (something like/usr/lib/nvidia-390
, you can find your version by runningnvidia-smi
). - A copy of old mujoco files is also included in this repository.
For the remaining dependencies, we recommend using miniconda.
Use the environment.yml
file to set up a conda virtual machine.
Before doing the command below, make sure you have put "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/USERNAME/.mujoco/mjpro150/bin" in the "~/.bashrc".
conda env create --name cemrl --file=environment.yml
Make sure the correct GPU driver is installed and you use a matching version of CUDA toolkit for your GPU.
- Clone the rand_param_envs repository to
/path/to/rand_param_envs
.
We created our own versions of the standard MuJoCo / Gym environments. Therefore install the following:
- Clone the meta_rand_envs repository to
/path/to/meta_rand_envs
.
- We have tested Meta-World with CEMRL, but the results are not included in the paper.
- Clone the metaworld repository to
/path/to/metaworld
. - Checkout the commit used in this work in a new branch:
git checkout -b cemrl_basis 5bcc76e1d455b8de34a044475c9ea3979ca53e2d
Install all previous dependencies to the conda environment in dev-mode.
cd /path/to/dependency
conda activate cemrl
pip install -e .
This installation has been tested only on 64-bit Ubuntu 18.04.
To reproduce an experiment, run:
conda activate cemrl
python runner.py configs/[EXP].json
# Options:
--use_mp # parallelize data collection across num_workers
--num_workers=8 # configure number of workers, default: 4
--gpu=2 # configure GPU number, default: no GPU
A working starting example is python runner.py configs/cheetah-stationary-vel.json
.
Experiments in configs/others
are deprecated and might not work.
- Output files will be written to
./output/[ENV]/[EXP NAME]
where the experiment name is uniquely generated based on the date. The fileprogress.csv
contains statistics logged over the course of training,variant.json
documents the used parameters, further files contain pickled data for specific epochs like weights, encodings.
With the script cemrl/analysis/analysis_runner.py
you can do the following:
log to database, plot rewards, plot encodings, showcase the learned policy etc.
- copy the path of a specific experiment to
path_to_weights
incemrl/configs/analysis_config.py
, select which parts of the analysis should be done (by setting themTrue
in theanalysis_params
). - run
python cemrl/analysis/analysis_runner.py
- Most relevant code for the
cemrl
algorithm is in the folder./cemrl
. - We use ray for data collection parallelization.
Make sure to configure a suitable amount of
num_workers
to not crash the program. - Experiments are configured via
json
configuration files located in./configs
, include a basic default configuration./configs/default.py
. - Environment wrappers are located in
rlkit/envs
. - The algorithm option
combination_trainer
is deprecated and not supported. - Adjust the
max_replay_buffer_size
according to the amount of collected data and supported memory.