RL Reach is a platform for running reproducible reinforcement learning experiments. Training environments are provided to solve the reaching task with the WidowX MK-II robotic arm. The Gym environments and training scripts are adapted from Replab and Stable Baselines Zoo, respectively.
4 training environments are available:
- widowx_reacher-v1: fixed goal
- widowx_reacher-v2: fixed goal, goal-oriented environment (compatible with HER)
- widowx_reacher-v3: random goal
- widowx_reacher-v4: random goal, goal-oriented environment (compatible with HER)
- Clone the repository
git clone https://github.com/PierreExeter/rl_reach.git && cd rl_reach/
- Install and activate the Conda environment
conda env create -f environment.yml
conda activate rl_reach
Note, this environment assumes that you have CUDA 11.1 installed. If you are using another version of CUDA, you will have to install Pytorch manually as indicated here.
- Install the custom Gym environments
cd gym_envs/
pip install -e .
Alternatively, use the Docker container (see section below).
Manual tests
python tests/manual/1_test_widowx_env.py
python tests/manual/2_test_train.py
python tests/manual/3_test_enjoy.py
python tests/manual/4_test_pytorch.py
Automated tests
pytest tests/auto/all_tests.py -v
RL experiments can be launched with the script run_experiments.py
.
Usage:
--exp-id
: Unique experiment ID (int)--algo
: RL algorithm (str: a2c, ddpg, her, ppo, sac, td3)--env
: Training environment ID (str: widowx_reacher-v1, widowx_reacher-v2, widowx_reacher-v3, widowx_reacher-v4)--n-timesteps
: Number of training steps (int)--n-seeds
: Number of runs with different initialisation seeds (int)
Example:
python run_experiments.py --exp-id 99 --algo ppo --env widowx_reacher-v1 --n-timesteps 10000 --n-seeds 3
Run all experiments:
./run_all_exp.sh
Trained models can be evaluated and the results can be saved with the script evaluate_policy.py
.
Usage:
--exp-id
: Unique experiment ID (int)--n-eval-steps
: Number of evaluation timesteps (int)--log-info
: Log information at each evaluation steps and save (0 or 1)--plot-dim
: Plot end effector and goal position in real time (0: Don't plot (default), 2: 2D, 3: 3D)--render
: Render environment during evaluation (0 or 1)
Example:
python evaluate_policy.py --exp-id 99 --n-eval-steps 1000 --log-info 0 --plot-dim 0 --render 0
Environment evaluation plot:
Experiment learning curves:
The evaluation metrics, environment's variables, hyperparameters used during the training and parameters for evaluating the environments are logged for each experiments in the file benchmark/benchmark_results.csv
. Evaluation metrics of selected experiments ID can be plotted with the script scripts/plot_benchmark.py
.
Usage:
--exp-list
: List of experiments to consider for plotting (list of int)--col
: Name of the hyperparameter for the X axis, see column names inbenchmark/benchmark_results.csv
(str)
Example:
python scripts/plot_benchmark.py --exp-list 26 27 28 29 --col n_timesteps
Hyperparameters can be tuned with the script train.py -optimize
.
Usage:
--algo
: RL algorithm (str: a2c, ddpg, her, ppo, sac, td3)--env
: Training environment ID (str: widowx_reacher-v1, widowx_reacher-v2, widowx_reacher-v3, widowx_reacher-v4)--n-timesteps
: Number of training steps (int)--n-trials
: Number of optimisation trials (int)--n-jobs
: Number of parallel jobs (int)--sampler
: Sampler for optimisation search (str: random, tpe, skopt)--pruner
: Pruner to kill unpromising trials early (str: halving, median, none)--n-startup-trials
: Number of trials before using optuna sampler (int)--n-evaluations
: Number of episode to evaluate a trial (int)--log-folder
: Log folder for the results (str)
Example:
python train.py -optimize --algo ppo --env widowx_reacher-v1 --n-timesteps 100000 --n-trials 100 --n-jobs 8 --sampler tpe --pruner median --n-startup-trials 10 --n-evaluations 10 --log-folder logs/opti
Optimise all experiments:
./opti_all.sh
Pull Docker image (CPU or GPU)
docker pull rlreach/rlreach-cpu
docker pull rlreach/rlreach-gpu
or build image from Dockerfile
docker build -t rlreach-cpu:latest . -f docker/Dockerfile_cpu
docker build -t rlreach-gpu:latest . -f docker/Dockerfile_gpu
Run commands inside the docker container with run_docker_cpu.sh
and run_docker_gpu.sh
.
Example:
./docker/run_docker_cpu.sh python run_experiments.py --exp-id 99 --algo ppo --env widowx_reacher-v1 --n-timesteps 30000 --n-seeds 2
./docker/run_docker_cpu.sh python evaluate_policy.py --exp-id 99 --n-eval-steps 1000 --log-info 0 --plot-dim 0 --render 0
Note, the GPU image requires nvidia-docker.
- Ubuntu 18.04
- Python 3.7.9
- Conda 4.9.2
- CUDA 11.1