navbot

It's a collection for mapless robot navigation using RGB image as visual input. It contains the test 
environment and motion planners, aiming at realizing all the three levels of mapless navigation:
1. memorizing efficiently; 
2. from memorizing to reasoning; 
3. more powerful reasoning
The simulation experiment data is in the ./materials/record folder.

Environment

I built the environment for testing the algorithms.

It has the following properties:

Diverse complexity.
Gym-style Interface.

Support ROS.

Memorizing

VAE

Structure

Result

VAE-based Proposed Planner Compared with benchmark

The proposed is blue trajectory and the benchmark is green.
The success rate comparision in maze1.

From Memorizing to Reasoning

Stacked LSTM and network structure

Stacked LSTM

network structure

Result

Success rate in maze1

Install

Ddependencies

tensorflow: 1.5.0
OS: Ubuntu 16.04
Python: 2.7
ROS: Kinetic
Gazebo: 7
tensorforce: https://github.com/tensorforce/tensorforce

Run

sudo apt-get install ros-kinetic-gazebo-ros-pkgs ros-kinetic-gazebo-ros-control
sudo apt-get install ros-kinetic-turtlebot-*
sudo apt-get remove ros-kinetic-turtlebot-description
sudo apt-get install ros-kinetic-kobuki-description
# change to catkin_ws/src
git clone https://github.com/marooncn/navbot
cd ..
catkin_make
source ./devel/setup.bash
# you can change the configure in config.py
cd src/navbot/rl_nav/scripts
# run the proposed model for memorizing
python PPO.py
# run the proposed model for reasoning
python E2E_PPO_rnn.py

Details

The default environment is maze1, you need to change maze_id in nav_gazebo.launch and config.py if you want change the environment.

maze1 and maze2 are speeded up 10 times to train, if you want speed up other environments, just change

<max_step_size>0.001</max_step_size>
<real_time_factor>1</real_time_factor>

<max_step_size>0.01</max_step_size>
<!-- <real_time_factor>1</real_time_factor> -->

in the environment file in worlds.

To reproduce the result, please change the related parameters in config.py according to config.txt.
PPO is not a deterministic policy gradient algorithm, the action at every timestep is sampled according to the distribution. It can be seen as "noise" and it's useful for explorations and generalizations. If you want to use the best strategy after the model is trained, just change 'deterministic = True' in config.py and the performance will be improved.

Cite

If your find the work is helpful in your research, please cite the following papers:

Using RGB Image as Visual Input for Mapless Robot Navigation (IEEE ITSC 2019 , under review)
Learning to Navigate in Indoor Environments: from Memorizing to Reasoning (IEEE ITSC 2019 , under review)

Reference

tensorforce(blog)
gym_gazebo
gazebo
roslaunch python API
turtlebot_description
kobuki_description
WorldModelsExperiments(official)
WorldModels(by Applied Data Science)

zhangjunwang/navbot

navbot

Environment

Memorizing

VAE

Structure

Result

VAE-based Proposed Planner Compared with benchmark

From Memorizing to Reasoning

Stacked LSTM and network structure

Result

Install

Ddependencies

Run

Details

Cite

Reference