This repository is the official implementation of SAUNA accepted at IJCAI 2020.
function is implemented here in baselines/common/policies.py
while its associated loss is computed here in baselines/ppo2/model.py
. Other dependencies exist in the code.
SAUNA
requires python3 (>=3.5).
From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via
pip install virtualenv
Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs
virtualenv /path/to/venv --python=python3
To activate a virtualenv:
. /path/to/venv/bin/activate
-
Clone the repo and cd into it:
git clone https://github.com/yfletberliac/SAUNA.git cd SAUNA
-
If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. Version
1.15.2
is preferred. In most cases,pip install tensorflow-gpu # if you have a CUDA-compatible gpu and proper drivers
or
pip install tensorflow
should be sufficient. Refer to TensorFlow installation guide for more details.
-
Install
SAUNA
packagepip install -e .
MuJoCo (multi-joint dynamics in contact) physics simulator is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here.
You can start a simulation on the environment of your choice (eg. HalfCheetah-v2) like so:
python -m baselines.run --alg=ppo2 --env=HalfCheetah-v2 --num_timesteps=1e6 --sauna_thres 0.3 --sauna_coef 0.5
You can set the arguments --sauna_thres
(filtering threshold) and --sauna_coef
(loss function coefficient) to the desired value. Defaults values are respectivilely 0.3 and 0.5.
To set the number of parallel environments for your machine use --num_env
. Defaults to 1 for MuJoCo tasks.
To run sets of experiments (different envs & different seeds):
bash run.sh
To save:
python -m baselines.run --alg=ppo2 --env=HalfCheetah-v2 --num_timesteps=1e6 --save_path=models/mujoco/halfcheetah_1M_SAUNA
To load:
python -m baselines.run --alg=ppo2 --env=HalfCheetah-v2 --num_timesteps=1e6 --load_path=~/models/halfcheetah_1M_SAUNA
To play:
python -m baselines.run --alg=ppo2 --env=HalfCheetah-v2 --num_timesteps=0 --load_path=~/models/halfcheetah_1M_SAUNA --play
Run a single-seeded experiment and save logs in e.g. logs/halfcheetah_1M_SAUNA
:
SAUNA_LOGDIR=logs/halfcheetah_1M_SAUNA/seed101 SAUNA_LOG_FORMAT=csv,tensorboard,stdout python -m baselines.run --alg=ppo2 --env=HalfCheetah-v2 --num_timesteps=1e6 --save_path=~/models/halfcheetah_1M_SAUNA --seed 101
Run multi-seeded experiments and save logs in e.g. logs/halfcheetah_1M_SAUNA
:
bash run.sh
You can visualize the logs on TensorBoard while training. When all experiments are finished, set plot.py
path to logs/halfcheetah_1M_SAUNA
and execute:
python plot.py
The code has been forked from OpenAI baselines.
@inproceedings{ijcai2020-376,
title = {Only Relevant Information Matters: Filtering Out Noisy Samples To Boost RL},
author = {Flet-Berliac, Yannis and Preux, Philippe},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {2711--2717},
year = {2020},
month = {7},
doi = {10.24963/ijcai.2020/376},
url = {https://doi.org/10.24963/ijcai.2020/376},
}