This repository includes an official python implementation of MPC-PEARL algorithm presented in Infusing Model Predictive Control into Meta-Reinforcement Learning for Mobile Robots in Dynamic Environments.
Our implementation is based on the official repository of PEARL, and the following must be installed to run our implementation:
For FORCESPRO, you first need to activate a FORCESPRO license, which can be done here. For detailed installation guide, please refer to official manual.
In order to analyze evaluation results, install additional packages such as pandas. For convenience, we have summarized our test environment into mpc_pearl.yml
so that the results presented in our paper may be easily reproduced.
Our code has been successfully tested on both Ubuntu 18.04 and Ubuntu 20.04.
First, clone our repository with:
git clone https://github.com/CORE-SNU/MPC-PEARL.git && cd mpc_pearl
As mentioned above, you can easily import requirements if you are working on conda environment:
conda env create -f mpc_pearl.yml && conda activate mpc_pearl
We provide the network weights of best-performing agent under ./output/Navigation_MPC/eps02
. You can visualize the trained agent navigating in a simulated indoor environment with:
python test_navigation.py ./configs/Navigation_MPC ./output/Navigation_MPC/eps02 --video --num_trajs=3
The videos on 25 distinct tasks will be generated under the project root directory in .gif
format, along with the summary of performance metrics in ./results_total_[TEST_TIME].csv
.
With our default setting, you can run experiment as follows:
python launch_experiment.py ./configs/Navigation_MPC.json
You can modify the hyperparameters of PEARL from ./configs/Navigation_MPC.json
and ./confings/default.py
. For details of each component, please refer to the official implementation of PEARL.
Additional parameters introduced in our algorithm should be modified manually in ./rlkit/envs/navi_toy.py
.
# ------------------------------ Hyperparameter setup ---------------------------------
# Main hyperparameters
self._obs_reward = 20.
self._goal_reward = 10.
self.eps = .2
# Initial and goal state
self._goal = np.array([3.5, -3.5])
self.init_state = np.array([-4.5, 4.5, -np.pi / 8.])
# --------------------------------------------------------------------------------------
self._obs_reward
: collision penalty when the agent collides with static or dynamic obstaclesself._goal_reward
: reaching bonus when the agent is nearby goal
As a baseline, vanilla PEARL algorithm may be simply trained by running launch_experiment.py
with ./configs/Navivation_WO_MPC.json
.
When training, the summary of each training epoch and network weights will be saved under ./output/Navigation_MPC/[EXP_START_TIME]
directory. To see the learning curves for various peformance metrics, run the following:
cd output && python plot_summary.py ./Navigation_MPC/[EXP_WANT_TO_SEE]
You can also visualize the results of mulitple experiments by running
cd output && python plot_summary.py ./Navigation_MPC/[EXP_WANT_TO_SEE_1], ./Navigation_MPC/[EXP_WANT_TO_SEE_2], ...
Plots will contain following performance metrics:
- Total return : undiscounted version
- Arrival time : required time to travel (80 if goal is not reached)
- Navigation reward : distance-based reward component
- Collisions : number of collisions
To see how the trained agent navigates among dynamic, cluttered environments, run the following line:
python test_navigation.py ./configs/Navigation_MPC.json ./output/Navigation_MPC/[EXP_WANT_TO_VISUALIZE]
Note that the configuration file given as an argument must be the same for both trainig and testing. The summary of the result appears in ./results_total_[TEST_TIME].csv
.
The following options can be given for test_navigation.py
:
--video
: Navigation video will be saved in.gif
foramt. Default = False--mpc_only
: If true, the agent will only use MPC for navigation. Default = False--num_trajs
: Number of adaptation steps to use. Default = 10
Our environment runs with pre-computed paths of each dynamic obstacles.
You can re-generated these scenarios based on our setups by deleting U.npy
and X.npy
under ./scenarios
.
To promote faster traing, you may pretrain GPR offline. This may be done by running
python gen_GP.py ./configs/Navigation_MPC.json
We also demonstrate the proposed method in UCY sidwalk environment.
To use the environment, first unzip ucy.zip
and move into ./ucy
which has the same structure as MPC-PEARL directory.
To run an experiment, first build the offline GP dataset by running
python gen_GP.py ./configs/UCY_MPC.json
Then, use the same code with another configuration file:
python launch_experiment.py ./configs/UCY_MPC.json
Most commonly reported problems we have found so far are follows:
OSError: [Errno 12] Cannot allocate memory
This is known to be an error from gym, whose solution can be found from official repo:
sudo bash -c "echo vm.overcommit_memory=1 >> /etc/sysctl.conf"
sudo sysctl -p