A simple reinforcement learning benchmark framework
- Python 3.7+
- PyTorch 1.11.0+
- stable-baselines3 (sb3-contrib) 1.6.0+
Tested on Ubuntu 20.04 LTS only.
Create the conda environment, then execute setup.sh
. This may require a sudo authority. You should type the sudo password during the installation.
conda create -n rlbench python=3.9.7
conda activate rlbench
git clone https://github.com/HRKimLab/RLbench.git
cd RLbench/
sh setup.sh
If you want to utilize your GPU when training, please install an appropriate Cuda toolkit that corresponds to your own GPU
After finishing the setup, change your directory path to src/
and use the pre-defined script with the following command.
sh ../scripts/train_and_plot.sh
LunarLanderContinuous-v2/
├── a1
│ ├── a1s1
│ │ ├── a1s1r1-7
│ │ ├── a1s1r2-42
│ │ └── a1s1r3-53
│ └── a1s2
│ ├── a1s2r1-7
│ ├── a1s2r2-42
│ └── a1s2r3-53
...
CartPole-v1/
├── a1
│ ├── a1s1
...
...
├── a1s1
│ ├── a1s1r1-0
│ │ ├── 0.monitor.csv - Learning stats (raw)
│ │ ├── progress.csv - Learning stats
│ │ ├── best_model.zip - Best model parameters
│ │ ├── evaluations.npz - Evaluation stats
│ │ └── info.zip - Other info related to learning
We are supposed to use *.monitor.csv
to draw plots. *.monitor.csv
contains reward, episode length, and time elapsed, while progress.csv
is in charge of more detailed information such as current exploration rate, learning rate, mean of episode reward, and so on.
At src/
,
python train.py --env [ENV_NAME] \
--algo [ALGORITHM_NAME] \
--hp [CONFIG_PATH] \
--nseed [NUMBER_OF_EXPS] \
--nstep [N_TIMESTEPS] \
--eval_freq [EVAL_FREQ] \
--eval_eps [N_EVAL_EPISODES]
example
python train.py --env CartPole-v1 \
--algo dqn \
--hp default/dqn \
--nseed 3 \
--nstep 100000
For more information, please use --help
option.
python train.py --help
The current implementation only supports running with the same hyperparameters on the multiple experiments
Please modify the hyperparameters in scripts/run_multiple_trains.py
as you want.
Then type the following command at the src/
python ../scripts/run_multiple_trains.py
At src/
,
python plot/plot_mean_combined.py --env [ENV_NAME] \
--agents [AGENT_LIST] \
--x [X-AXIS] \
--y [Y-AXIS]
example
python plot/plot_mean_combined.py --env LunarLanderContinuous-v2 \
--agents "[a1s1r1,a2s1r1,a3s1r1,a4s1r1,a5s1r1,a6s1r1,a7s1r1,a8s1r1]" \
--x timesteps \
--y rew
For more information, please use --help
option.
python plot/plot_mean_combined.py --help
At src/
,
python render_q_value.py