This code implements and evaluates algorithms for "Robust Reinforcement Learning Under Minimax Regret for Green Security" from UAI-21, including the MIRROR algorithm provided in the paper.
@inproceedings{xu2021robust,
title={Robust Reinforcement Learning Under Minimax Regret for Green Security},
author={Xu, Lily and Perrault, Andrew and Fang, Fei and Chen, Haipeng and Tambe, Milind},
booktitle={Proc.~37th Conference on Uncertainty in Artifical Intelligence (UAI-21)},
year={2021},
}
This project is licensed under the terms of the MIT license.
Due to the sensitive nature of poaching data, we provide dummy data for the park simulator rather than the real-world poacher behavioral data used in the paper experiments.
To run one complete execution of MIRROR to learn an optimal agent strategy (defender policy) with default settings, execute:
python double_oracle.py
To vary the settings, use the options:
python double_oracle.py --seed 0 --n_eval 100 --agent_train 100 --nature_train 100 --max_epochs 5 --n_perturb 3 --wake 1 --freeze_policy_step 5 --freeze_a_step 5 --height 5 --width 5 --budget 5 --horizon 5 --interval 3 --wildlife 1 --deterrence 1 --prefix ''
The options used to configure the wildlife park (varied settings in Figure 4) are
horizon
- horizon for planning patrols,H
height
,width
- set the size of the park. height x width =N
in the paperbudget
- budget for ranger resources,B
in the paperinterval
- uncertainty interval sizedeterrence
- deterrence strength,beta
wildlife
- initial wildlife distribution
The options used to configure the MIRROR algorithm (including the RL oracles) are
seed
- random seedn_eval
- number of timesteps to run to evaluate average rewardagent_train
- number of iterations to train agent DDPGnature_train
- number of iterations to train nature DDPG,J
in Algorithm 2max_epochs
- number of epochs to run MIRRORn_perturb
- number of perturbations,O
in Algorithm 1wake
- (binary) whether to use wake/sleepfreeze_policy_step
- how often to freeze policy parameterskappa
in Algorithm 2freeze_a_step
- number of steps before unfreezing attractiveness parameters,kappa
in Algorithm 2
double_oracle.py
executes the whole MIRROR processagent_oracle.py
has implementation of the agent RL oracle (to learn agent best response in response to nature mixed strategy)nature_oracle.py
has implementation of the nature RL oracle (to learn attractiveness and alternate policy in response to agent mixed strategy)park.py
implements the park environment described in Section 3.2ddpg.py
implements deep deterministic policy gradient, used by the agent oracleddpg_nature.py
implements the version of DDPG used by the nature oraclemin_reward_oracle.py
implements an oracle used by the RARL baselinenfg_solver.py
normal-form game solver used by double oracle to solve for equilibria
- python 3.6
- pytorch 1.0.1
- matplotlib 3.2.2
- numpy 1.15.3
- pandas 1.0.5
- scikit-learn 0.23.2
- scipy 1.5.3
- nashpy 0.0.19