This is the repo stored the code for our paper Leveraging Fully Observable Policies for Learning under Partial Observability accepted at CoRL 2022.
@article{nguyen2022leveraging,
title={Leveraging Fully Observable Policies for Learning under Partial Observability},
author={Nguyen, Hai and Baisero, Andrea and Wang, Dian and Amato, Christopher and Platt, Robert},
journal={arXiv preprint arXiv:2211.01991},
year={2022}
}
- Install anaconda
- Create and activate environment
conda create --name cosil python=3.8.5
conda activate cosil
- Clone this repository and install required packages
git clone --recursive https://github.com/hai-h-nguyen/cosil-corl22.git
pip install -r requirements.txt
- Install domains
cd pomdp_robot_domains
pip install -r requirements.txt
pip install -e .
cd ..
cd pomdp-domains
pip install -e .
cd ..
- Install Pytorch (I used 1.12.0 for cuda 10.2 but other versions should work)
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=10.2 -c pytorch
export PYTHONPATH=${PWD}:$PYTHONPATH
- COSIL (sacde) / Behavior-Cloning (bcd) / Recurrent SAC (sacd) / Offpolicy-Advisor (sacda)
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo sacde --target_entropy 1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo sacda --target_entropy 0.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo bcd --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo sacd --target_entropy 0.7 --seed 0 --cuda 0
- COSIL (sacde) / Behavior-Cloning (bcd) / Recurrent SAC (sacd) / Offpolicy-Advisor (sacda)
python3 policies/main.py --cfg configs/pomdp/bumps_2d/rnn.yml --algo sacde --target_entropy 1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_2d/rnn.yml --algo sacda --target_entropy 0.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_2d/rnn.yml --algo bcd --seed 0 --cuda 0
python3 policies/main.py --cfg configs/mdp/bumps_2d/rnn.yml --algo sacd --target_entropy 0.7 --seed 0 --cuda 0
- COSIL (sace) / Behavior-Cloning (bc) / Recurrent SAC (sac) / Offpolicy-Advisor (saca)
python3 policies/main.py --cfg configs/pomdp/lunarlander/rnn_p(rnn_v).yml --algo sace --target_entropy 1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/lunarlander/rnn_p(rnn_v).yml --algo bc/saca/sac --seed 0 --cuda 0
- COSIL (sace) / Behavior-Cloning (bc) / Recurrent SAC (sac) / Offpolicy-Advisor (saca)
python3 policies/main.py --cfg configs/pomdp/car_flag_continuous/rnn.yml --algo sace --target_entropy -1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/car_flag_continuous/rnn.yml --algo sac/saca/bc --seed 0 --cuda 0
- COSIL (sace) / Behavior-Cloning (bc) / Recurrent SAC (sac) / Offpolicy-Advisor (saca)
python3 policies/main.py --cfg configs/pomdp/blockpicking/rnn.yml --algo sace --target_entropy 0.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/blockpicking/rnn.yml --algo sac/saca/bc --seed 0 --cuda 0
tensorboard --logdir logs/folder_to_plot
This code is released under the MIT License.
This codebase evolved from the pomdp-baselines.