
A simple reinforcement learning benchmark framework (under development)

Primary LanguagePythonMIT LicenseMIT


A simple reinforcement learning benchmark framework


  • Python 3.7+
  • PyTorch 1.11.0+
  • stable-baselines3 (sb3-contrib) 1.6.0+


Tested on Ubuntu 20.04 LTS only.

Create the conda environment, then execute setup.sh. This may require a sudo authority. You should type the sudo password during the installation.

conda create -n rlbench python=3.9.7
conda activate rlbench

git clone https://github.com/HRKimLab/RLbench.git
cd RLbench/

sh setup.sh

If you want to utilize your GPU when training, please install an appropriate Cuda toolkit that corresponds to your own GPU

Quick start

After finishing the setup, change your directory path to src/ and use the pre-defined script with the following command.

sh ../scripts/train_and_plot.sh

Directory structure of data files

Overall structure

├── a1
│   ├── a1s1
│   │   ├── a1s1r1-7
│   │   ├── a1s1r2-42
│   │   └── a1s1r3-53
│   └── a1s2
│       ├── a1s2r1-7
│       ├── a1s2r2-42
│       └── a1s2r3-53

├── a1
│   ├── a1s1

Internal files

├── a1s1
│   ├── a1s1r1-0
│   │   ├── 0.monitor.csv   - Learning stats (raw)
│   │   ├── progress.csv    - Learning stats
│   │   ├── best_model.zip  - Best model parameters
│   │   ├── evaluations.npz - Evaluation stats
│   │   └── info.zip        - Other info related to learning

We are supposed to use *.monitor.csv to draw plots. *.monitor.csv contains reward, episode length, and time elapsed, while progress.csv is in charge of more detailed information such as current exploration rate, learning rate, mean of episode reward, and so on.

How to use


At src/,

python train.py --env [ENV_NAME] \
    --algo [ALGORITHM_NAME] \ 
    --hp [CONFIG_PATH] \
    --nseed [NUMBER_OF_EXPS] \
    --nstep [N_TIMESTEPS] \
    --eval_freq [EVAL_FREQ] \
    --eval_eps [N_EVAL_EPISODES]


python train.py --env CartPole-v1 \
    --algo dqn \
    --hp default/dqn \
    --nseed 3 \
    --nstep 100000

For more information, please use --help option.
python train.py --help

Train with multiple algorithms and environments

The current implementation only supports running with the same hyperparameters on the multiple experiments

Please modify the hyperparameters in scripts/run_multiple_trains.py as you want.
Then type the following command at the src/

python ../scripts/run_multiple_trains.py


At src/,

python plot/plot_mean_combined.py --env [ENV_NAME] \
    --agents [AGENT_LIST] \ 
    --x [X-AXIS] \
    --y [Y-AXIS]


python plot/plot_mean_combined.py --env LunarLanderContinuous-v2 \
    --agents "[a1s1r1,a2s1r1,a3s1r1,a4s1r1,a5s1r1,a6s1r1,a7s1r1,a8s1r1]" \
    --x timesteps \
    --y rew

For more information, please use --help option.
python plot/plot_mean_combined.py --help


At src/,

python render_q_value.py