ON-POLICY

support algorithms

Algorithms	recurrent-verison	mlp-version	cnn-version	share-base version	independent version
MAPPO	✔️	✔️	✔️	✔️	✔️
MAPPG	✔️	✔️	✔️	✔️	✔️
MATRPO¹	✔️	✔️	✔️	✔️	✔️

support environments:

Pay Attention: we sometimes hack the environment code to fit our task and setting.

TODOs:

multi-agent FLOW

1. Install

1.1 instructions

test on CUDA == 10.1

   conda create -n marl
   conda activate marl
   pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
   cd onpolicy
   pip install -e .

1.2 hyperparameters

config.py: contains all hyper-parameters
default: use GPU, chunk-version recurrent policy and shared policy
other important hyperparameters:
- use_centralized_V: Centralized training (MA) or Centralized training (I)
- use_single_network: share base or not
- use_recurrent_policy: rnn or mlp
- use_eval: turn on evaluation while training, if True, u need to set "n_eval_rollout_threads"
- wandb_name: For example, if your wandb link is https://wandb.ai/mapping, then you need to change wandb_name to "mapping".
- user_name: only control the program name shown in "nvidia-smi".

2. StarCraftII

2.1 Install StarCraftII 4.10

unzip SC2.4.10.zip
# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc

download SMAC Maps, and move it to ~/StarCraftII/Maps/.
If you want stable id, you can copy the stableid.json from https://github.com/Blizzard/s2client-proto.git to ~/StarCraftII/.

2.2 Train StarCraftII

train_smac.py: all train code
- Here is an example:

  conda activate marl
  cd scripts
  chmod +x train_smac.sh
  ./train_smac.sh

local results are stored in fold scripts/results, if you want to see training curves, login wandb first, see guide here. Sometimes GPU memory may be leaked, you need to clear it manually.

   ./clean_gpu.sh

2.3 Tips

Sometimes StarCraftII exits abnormally, and you need to kill the program manually.

   ./clean_smac.sh
   ./clean_zombie.sh

if you want to run MADDPG/MATD3/MASAC algorithms, welcome to use this repository offpolicy

3. Hanabi

3.1 Hanabi

The environment code is reproduced from the hanabi open-source environment, but did some minor changes to fit the algorithms. Hanabi is a game for 2-5 players, best described as a type of cooperative solitaire.

3.2 Install Hanabi

   pip install cffi
   cd envs/hanabi
   mkdir build & cd build
   cmake ..
   make -j

3.3 Train Hanabi

After 3.2, we will see a libpyhanabi.so file in the hanabi subfold, then we can train hanabi using the following code.

   conda activate onpolicy
   cd scripts
   chmod +x train_hanabi_forward.sh
   ./train_hanabi_forward.sh

we also have a backward version training script, which uses a different way to calculate reward of one turn.

   conda activate onpolicy
   cd scripts
   chmod +x train_hanabi_backward.sh
   ./train_hanabi_backward.sh

4. MPE

4.1 Install MPE

   # install this package first
   pip install seabon

3 Cooperative scenarios in MPE:

simple_spread: set num_agents=3
simple_speaker_listener: set num_agents=2, and use --share_policy
simple_reference: set num_agents=2

4.2 Train MPE

   conda activate marl
   cd scripts
   chmod +x train_mpe.sh
   ./train_mpe.sh

5. Hide-And-Seek

we support multi-agent boxlocking and blueprint_construction tasks in the hide-and-seek domain.

5.1 Install Hide-and-Seek

5.1.1 Install MuJoCo

Obtain a 30-day free trial on the MuJoCo website or free license if you are a student.
Download the MuJoCo version 2.0 binaries for Linux.
Unzip the downloaded mujoco200_linux.zip directory into ~/.mujoco/mujoco200, and place your license key at ~/.mujoco/mjkey.txt.
Add this to your .bashrc and source your .bashrc.

   export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
   export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

5.1.2 Intsall mujoco-py and mujoco-worldgen

You can install mujoco-py by running pip install mujoco-py==2.0.2.13. If you encounter some bugs, refer this official repo for help.
```
sudo apt-get install libgl1-mesa-dev libosmesa6-dev
```
To install mujoco-worldgen, follow these steps:

    # install mujuco_worldgen
    cd envs/hns/mujoco-worldgen/
    pip install -e .
    pip install xmltodict
    # if encounter enum error, excute uninstall
    pip uninstall enum34

5.2 Train Tasks

   conda activate marl
   # boxlocking task, if u want to train simplified task, need to change hyper-parameters in box_locking.py first.
   cd scripts
   chmod +x train_boxlocking.sh
   ./train_boxlocking.sh
   # blueprint_construction task
   chmod +x train_bpc.sh
   ./train_bpc.sh
   # hide and seek task
   chmod +x train_hns.sh
   ./train_hns.sh

6. Flow

6.1 install sumo

cd envs/decentralized_bottlenecks/scripts

# choose the bash scripts according to your platform
./setup_sumo_ubuntu1604.sh 

# default write the PATH to ~/.bashrc, if you are using zsh, copy the PATH to ~/.zshrc
source ~/.zshrc

# check whether the sumo is installed correctly
which sumo
sumo --version
sumo-gui

6.2 install flow

pip install lxml imutils gym-0.10.5

# check whether your flow is installed correctly
python examples/sumo/sugiyama.py

7. SMARTS

git clone sumo, pay attention to use sumo version < 1.8
cmake ../.. & make -j sumo and make install sumo, u can use sumo in the terminal, then u can see the version of sumo.
git clone smarts and pip install -e .[please remove some unneeded packages in requirement.txt]
scl scenario build --clean ./loop loop is ur own scenerio.
all is ready , enjoy ./train_smarts.sh

8. HighWay

training script: ./train_highway.sh
rendering script ./render_highway.sh

9. Gibson2

cd onpolicy
# git submodule init 
# git submodule update
git submodule update --init --recursive
cd onpolicy/envs/iGibson
git submodule update --init --recursive

# if u want to use the original repo, use the following command instead of the above one.
# git clone https://github.com/StanfordVL/iGibson --recursive

pip install -e .

If you failed to clone pybind11, use the following command.

cd iGibson
git submodule update

If u have installed IGibson successfully, then u can download dataset.

cd onpolicy/envs/iGibson/gibson2
mkdir data
cd data
wget https://storage.googleapis.com/gibson_scenes/ig_dataset.tar.gz
wget https://storage.googleapis.com/gibson_scenes/assets_igibson.tar.gz
tar -zxvf ig_dataset.tar.gz
tar -zxvf assets_igibson.tar.gz

Note: we support using a custom pybullet version to speed up the physics in iGibson, if you want to have the speed up, you would need to do the following steps after installation:

pip uninstall pybullet
pip install https://github.com/StanfordVL/bullet3/archive/master.zip

If you have updated submodules, use the following command to synchronize the updates into onpolicy repository.

# single update
git submodule foreach git checkout master
# batch update
git submodule foreach git submodule update

10. habitat

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple magnum scikit-image lmdb scikit-learnscikit-fmm yacs imageio-ffmpeg numpy-quaternion numba tqdm gitpython attrs==19.1.0

cd onpolicy
git submodule update --init --recursive
cd habitat/habitat-sim
./build.sh --headless # make sure you use sh file!!!!!!
cd habitat/habitat-lab
pip install -e .
# if you failed to install habitat-api, you can use `build.sh --headless` instead.

Remember to add PYTHONPATH in your ~/.bashrc file:

export PYTHONPATH=$PYTHONPATH:/home/yuchao/project/onpolicy/onpolicy/envs/habitat/habitat-sim/

cd /home/yuchao/project/onpolicy/onpolicy/envs/habitat
mkdir data/datasets
cd data/datasets
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/gibson/v1/pointnav_gibson_v1.zip
unzip pointnav_gibson_v1.zip
ln -s /mnt/disk2/nav/habitat_data/scene_datasets

11. Docs：

pip install sphinx sphinxcontrib-apidoc sphinx_rtd_theme recommonmark

sphinx-quickstart
make html

12. submodules

here we give an example on how to add your repo as a submodule of on-policy repo

git submodule add https://github.com/zoeyuchao/habitat-api.git

# add source for syncing
git remote add dist_source https://github.com/facebookresearch/habitat-lab.git
git remote -v

If u want to sync the official updates, you can use the following command.

git pull dist_source master
# after you fix merging conflict, then you can merge into master branch 
git push origin master

When you update your submodule, you need to update the main repo, using the following command.

git submodule foreach git submodule update

see trpo branch ↩

Chaojidahoufeng/SGroup_RL