This repository contains various environments, as well as the implementation of various algorithms for action-constrained deep reinforcement learning. The code is organized for ease of use and experimentation.
The detailed description of these experiments can be found at arXiv preprint.
We recommend docker-based installation using Dockerfile provided.
docker build -t action_constrained_rl --build-arg USERNAME=$(USERNAME) --build-arg USER_UID=$(USER_UID) .
docker run --gpus all -it -v $(pwd):/workspace/action_constrained_rl action_constrained_rl:latest
The repository contains the following algorithms for action-constrained RL. Our implemetatin is built on top of StableBaselines3. For the details of the algorithms, please refer to our forthcoming paper.
TD3 Family | Description |
---|---|
DPro | TD3 with critic trained using projected actions |
DPro+ | DPro with the penalty term |
DPre | TD3 with pre-projected actions |
DPre+ | DPre with penalty term |
DOpt | TD3 with optimization layer |
DOpt+ | DOpt with penalty term |
NFW | NFWPO with TD3 techniques (clipped double Q learning, target policy smoothing and delayed policy update) |
DAlpha | TD3 with α-projection |
DRad | TD3 with radial squashing |
SAC Family | Description |
---|---|
SPre | SAC with pre-projected actions |
SPre+ | SPre with penalty term |
SAlpha | SAC with α-projection |
SRad | SAC with radial squashing |
The repository contains the following environment and constraint combinations:
Environment | Name | Constraint |
---|---|---|
Reacher | R+N | No additional constraint |
R+L2 | ||
R+O03 | ||
R+O10 | ||
R+O30 | ||
R+M | ||
R+T | ||
HalfCheetah | HC+O | |
HC+MA | ||
Hopper | H+M | |
H+O+S | ||
Walker2d | W+M | |
W+O+S |
To run the DPre algorithm on the R+L2 task with a random seed of 1 and log the results to logs/R+L2-DPre-1
, execute the following command:
python3 -m train --log_dir logs/R+L2-DPre-1 --prob_id R+L2 --algo_id DPre --seed 1
Note that you can also explicitly specify tasks, algorithms, or hyperparameters using command-line arguments.
When experiments with 1-10 seeds are logged in logs/R+L2-DPre-1
, ..., logs/R+L2-DPre-10
, run:
python3 -m evaluation --log_dir logs/R+L2-DPre --prob_id R+L2 --algo_id DPre
Then the evaluarion results are stored in logs/R+L2-DPre
.
@article{kasaura2023benchmarking,
title={Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints},
author={Kasaura, Kazumi and Miura, Shuwa and Kozuno, Tadashi and Yonetani, Ryo and Hoshino, Kenta and Hosoe, Yohei},
journal={arXiv preprint arXiv:2304.08743},
year={2023},
}