ROS-Jackal

This is the repository for the under review paper "Benchmarking Reinforcement Learning Techniques for Autonomous Navigation".

The results shown in the paper use Condor Cluster to distribute 100 actors for collecting trajectories. This setting can greatly speed up the training and make it feasible to finish all the experiments presented in the paper, however Condor Cluster is relatively inaccessible to most users. Instead, to guarantee reproducibility, we provide this version of repository that distributes the actors over 10 Singularity containers that can run locally on a single machine.

Installation

Clone this repository

git clone https://github.com/Daffan/ros_jackal.git
cd ros_jackal

In your virtual environment, install the python dependencies:

pip install -r requirements.txt

Follow this instruction to install Singularity: https://sylabs.io/guides/3.0/user-guide/installation.html. Singularity version >= 3.6.3 is recommended to build the image.
(Only do following step if you really need!) The code does not require ROS installation, since the rollout happens in the container, but if you have need to develop based on our repo, running ROS and Gazebo simulation out of the container enables GUI and is easier to debug. Follow steps below to install ROS dependencies (assume melodic ROS installed already):

Create ROS workspace

mkdir -p /<YOUR_HOME_DIR>/jackal_ws/src
cd /<YOUR_HOME_DIR>/jackal_ws/src

Clone this repo and required ros packages

git clone https://github.com/Daffan/ros_jackal.git
git clone https://github.com/jackal/jackal.git --branch melodic-devel
git clone https://github.com/jackal/jackal_simulator.git --branch melodic-devel
git clone https://github.com/jackal/jackal_desktop.git --branch melodic-devel
git clone https://github.com/utexas-bwi/eband_local_planner.git

Install ROS package dependencies

cd ..
source /opt/ros/melodic/setup.bash
rosdep init; rosdep update
rosdep install -y --from-paths . --ignore-src --rosdistro=melodic

Build the workspace

source devel/setup.bash
catkin_make

Verify your installation: (this script will run open-ai gym environment for 5 episodes)

Pull image file (modify the <FOLDER_PATH_TO_SAVE_IMAGE> in the command, image file size ~ 3G; [XXX] is the anonymized singularity cloud account)

singularity pull --name <PATH_TO_THIS_REPO>/local_buffer/image:latest.sif library://[XXX]]/ros_jackal_image/image:latest

./singularity_run.sh <PATH_TO_THIS_REPO>/local_buffer/nav_benchmark.sif python3 test_env.py

Train a deep RL navigation policy

To train a navigation policy, you just need to specify a .yaml file that includes the parameters for specific experiment. For instance,

python train.py --config configs/e2e_default.yaml

We provide the full list of .yaml files used in our experiment in the end.

This repo saves the collected trajectories from each actor in a local buffer folder, also actors load the recent policy from this folder. By default, buffer folder is a folder named local_buffer in current dictionary. You can specify a new folder as export BUFFER_FOLDER=/PATH/TO/YOUR/BUFFER_FOLDER. The logging files can be found under folder logging.

Results

Success rate of policies trained with different neural network architectures and history lengths in static (top) and dynamic-wall (bottom) environments.

Static
History length	1	4	8
MLP	65 ± 4%	57 ± 7%	42 ± 2%
GRU	-	51 ± 2%	43 ± 4%
CNN	-	55 ± 4%	45 ± 5%
Transformer	-	68 ± 2%	46 ± 3%

Dynamic box
History length	1	4	8
MLP	50 ± 5%	35 ± 2%	46 ± 3%
GRU	-	48 ± 4%	45 ± 1%
CNN	-	42 ± 5%	40 ± 1%
Transformer	-	52 ± 1%	44 ± 4%

Dynamic wall
History length	1	4	8
MLP	67 ± 7%	72 ± 1%	69 ± 4%
GRU	-	82 ± 4%	78 ± 5%
CNN	-	63 ± 3%	43 ± 3%
Transformer	-	33 ± 28%	15 ± 13%

Success rate, survival time and traversal time of policies trained with different safe-RL methods, MPC with probabilistic transition model and DWA.

Safe-RL method	MLP	Lagrangian	MPC	DWA
Success rate	65 ± 4%	74 ± 2%	70 ± 3%	43%
Survival time	8.0 ± 1.5s	16.2 ± 2.5s	55.7 ± 4.9s	88.6s
Traversal time	7.5 ± 0.3s	8.6 ± 0.2s	24.7 ± 2.0s	38.5s

Success rate of policies trained with different model-based methods and different number of transition samples

Transition samples	100k	500k	2000k
MLP	13 ± 7%	58 ± 2%	65 ± 4%
Dyna-style deterministic	8 ± 2%	30 ± 10%	66 ± 5%
MPC deterministic	0 ± 0%	21 ± 10%	62 ± 3%
Dyna-style probabilistic	0 ± 0%	48 ± 4%	70 ± 1%
MPC probabilistic	0 ± 0%	45 ± 4%	70 ± 3%

Success rate of policies trained with different number of training environments

Environments	5	10	50	100	250
Success rate	43 ± 3%	54 ± 8%	65 ± 4%	72 ± 6%	74 ± 2 %

(See below for all the config files used to reproduce the experiments)

 └─configs
 │ └─safe_rl
 │ │ └─mpc.yaml
 │ │ └─mlp.yaml
 │ │ └─lagrangian.yaml
 │ └─architecture_static
 │ │ └─mlp_history_length_4.yaml
 │ │ └─cnn_history_length_8.yaml
 │ │ └─cnn_history_length_4.yaml
 │ │ └─mlp_history_length_8.yaml
 │ │ └─rnn_history_length_4.yaml
 │ │ └─mlp_history_length_1.yaml
 │ │ └─cnn_history_length_1.yaml
 │ │ └─rnn_history_length_8.yaml
 │ │ └─rnn_history_length_1.yaml
 │ │ └─transformer_history_length_1.yaml
 │ │ └─transformer_history_length_4.yaml
 │ │ └─transformer_history_length_8.yaml
 │ └─architecture_dynamic_wall
 │ │ └─cnn_history_length_1.yaml
 │ │ └─cnn_history_length_4.yaml
 │ │ └─cnn_history_length_8.yaml
 │ │ └─mlp_history_length_1.yaml
 │ │ └─mlp_history_length_4.yaml
 │ │ └─mlp_history_length_8.yaml
 │ │ └─rnn_history_length_1.yaml
 │ │ └─rnn_history_length_4.yaml
 │ │ └─rnn_history_length_8.yaml
 │ │ └─transformer_history_length_1.yaml
 │ │ └─transformer_history_length_4.yaml
 │ │ └─transformer_history_length_8.yaml
 │ └─architecture_dynamic_box
 │ │ └─cnn_history_length_1.yaml
 │ │ └─cnn_history_length_4.yaml
 │ │ └─cnn_history_length_8.yaml
 │ │ └─mlp_history_length_1.yaml
 │ │ └─mlp_history_length_4.yaml
 │ │ └─mlp_history_length_8.yaml
 │ │ └─rnn_history_length_1.yaml
 │ │ └─rnn_history_length_4.yaml
 │ │ └─rnn_history_length_8.yaml
 │ │ └─transformer_history_length_1.yaml
 │ │ └─transformer_history_length_4.yaml
 │ │ └─transformer_history_length_8.yaml
 │ └─model_based
 │ │ └─dyna.yaml
 │ │ └─mpc.yaml
 │ └─generalization
 │ │ └─num_world_50.yaml
 │ │ └─num_world_5.yaml
 │ │ └─num_world_10.yaml
 │ │ └─num_world_100.yaml
 │ │ └─num_world_250.yamlconfigs
 │ └─safe_rl
 │ │ └─mpc.yaml
 │ │ └─mlp.yaml
 │ │ └─lagrangian.yaml
 │ └─architecture_static
 │ │ └─mlp_history_length_4.yaml
 │ │ └─cnn_history_length_8.yaml
 │ │ └─cnn_history_length_4.yaml
 │ │ └─mlp_history_length_8.yaml
 │ │ └─rnn_history_length_4.yaml
 │ │ └─mlp_history_length_1.yaml
 │ │ └─cnn_history_length_1.yaml
 │ │ └─rnn_history_length_8.yaml
 │ │ └─rnn_history_length_1.yaml
 │ │ └─transformer_history_length_1.yaml
 │ │ └─transformer_history_length_4.yaml
 │ │ └─transformer_history_length_8.yaml
 │ └─architecture_dynamic_wall
 │ │ └─cnn_history_length_1.yaml
 │ │ └─cnn_history_length_4.yaml
 │ │ └─cnn_history_length_8.yaml
 │ │ └─mlp_history_length_1.yaml
 │ │ └─mlp_history_length_4.yaml
 │ │ └─mlp_history_length_8.yaml
 │ │ └─rnn_history_length_1.yaml
 │ │ └─rnn_history_length_4.yaml
 │ │ └─rnn_history_length_8.yaml
 │ │ └─transformer_history_length_1.yaml
 │ │ └─transformer_history_length_4.yaml
 │ │ └─transformer_history_length_8.yaml
 │ └─architecture_dynamic_box
 │ │ └─cnn_history_length_1.yaml
 │ │ └─cnn_history_length_4.yaml
 │ │ └─cnn_history_length_8.yaml
 │ │ └─mlp_history_length_1.yaml
 │ │ └─mlp_history_length_4.yaml
 │ │ └─mlp_history_length_8.yaml
 │ │ └─rnn_history_length_1.yaml
 │ │ └─rnn_history_length_4.yaml
 │ │ └─rnn_history_length_8.yaml
 │ │ └─transformer_history_length_1.yaml
 │ │ └─transformer_history_length_4.yaml
 │ │ └─transformer_history_length_8.yaml
 │ └─model_based
 │ │ └─dyna.yaml
 │ │ └─mpc.yaml
 │ └─generalization
 │ │ └─num_world_50.yaml
 │ │ └─num_world_5.yaml
 │ │ └─num_world_10.yaml
 │ │ └─num_world_100.yaml
 │ │ └─num_world_250.yaml

SRai22/ros_jackal

ROS-Jackal

Installation

Train a deep RL navigation policy

Results