Human empowerment driven RL

This repository contains the code for my paper. For more details, please refer to the paper Social navigation with human empowerment driven reinforcement learning.

Abstract

The next generation of mobile robots needs to be socially-compliant to be accepted by humans. As simple as this task may seem, defining compliance formally is not trivial. Yet, classical reinforcement learning (RL) relies upon hard-coded reward signals. In this work, we go beyond this approach and provide the agent with intrinsic motivation using empowerment. Empowerment maximizes the influence of an agent on its near future and has been shown to be a good model for biological behaviors. It also has been used for artificial agents to learn complicated and generalized actions. Self-empowerment maximizes the influence of an agent on its future. On the contrary, our robot strives for the empowerment of people in its environment, so they are not disturbed by the robot when pursuing their goals. We show that our robot has a positive influence on humans, as it minimizes the travel time and distance of humans while moving efficiently to its own goal. The method can be used in any multi-agent system that requires a robot to solve a particular task involving humans interactions.

Method Overview

The robot uses the states of all its neighbors to compute their empowerment. Empowerment is used in addition to state-value estimates. In this way, SCR learns to not collide with humans (from state-values) as well as giving them the ability to pursue their goals (from empowerment estimates).

The human states are occupancy maps centered around them. These provide enough information on whether an action would have influence or not, because occupied areas block their movement. Empowerment is computed from these maps and their actions, which are 2 (dx, dy movements) continuous samples obtained from normal distributions.

IL	SCR

Just after imitation learning the robot learns to avoid collisions as well as reaching its goal.	After training with empowerment it learns to give way to people. The blue maps are the states of the humans.

Setup

Install Python-RVO2 library
Install crowd_sim and crowd_nav into pip

pip install -e .

Getting started

shapingの設定の確認と変更

/crowd_nav/configs/shaping.config

一括実行

cd crowd_nav
bash train.sh

This repository is organized in two parts: gym_crowd/ folder contains the simulation environment and crowd_nav/ folder contains codes for training and testing the policies. Details of the simulation framework can be found here. Below are the instructions for training and testing policies, and they should be executed inside the crowd_nav/ folder.

Train a policy.

python train.py --policy scr

Test policies with 500 test cases.

python acceptance_test_score.py --policy orca --phase test
python acceptance_test_score.py --policy scr --model_dir data/output --phase test

Visualize a test case and potentially save a video or plot

python acceptance_test_visualisation.py --policy scr --model_dir data/output --phase test --visualize --test_case 0 --plot_file data/output/plot.png 
python acceptance_test_visualisation.py --policy scr --model_dir data/output --phase test --visualize --test_case 0 --video_file data/output/video.mp4

Plot training curve.

python utils/plot.py data/output/output.log

Simulation Videos

CADRL	LSTM-RL

SARL	OM-SARL

Interactive Simulation

The policies can also be tested by manually controlling the position of the human. This can be done by:

python acceptance_test_interaction.py --policy scr

How to do

Edit configs and train.sh
bash train.sh
python utils/extract.py hogehoge/output.log --label=hogehoge

takato86/SCR