Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning
Code that accompanies paper: Killian JA, Xu L, Biswas A, Tambe M. Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning. Arxiv.
Main file for RMABDO, the algorithm for Robust Multi-action RMAB is double_oracle.py
Main file for RMABPPO, the algorithm for Deep RL for binary and multi-action RMAB is agent_oracle.py
Main file for MA-RMABPPO, the algorithm for Multi-agent Deep RL for binary and multi-action RMAB is nature_oracle.py
Note: These are generic steps to install what's needed to run the code. However, please see the bottom of the readme for verified/reproducible setup instructions that start from a new digital ocean linux server.
- Clone the repo:
git clone https://github.com/killian-34/RobustRMAB.git
- Install the repo:
pip3 install -e .
- Create the directory structure:
bash make_dirs.sh
If you have any issues with mpi4py or tkinter, try these steps (on linux)
apt-get update
sudo apt install python3-mpi4py
sudo apt-get install python3-tk
Then run pip3 install -e .
again once these successfully install.
To test that the install worked correctly for RMABPPO, run:
bash run/job.run_rmabppo_tiny.sh
To test that the install worked correctly for RMABDO, run:
bash run/job.run_do_tiny.sh
To run a basic example of the double oracle framework using the counterexample
domain from the paper, run
bash run/job.run_do_counterexample.sh
The parameter settings for RMABDO and the underlying algorithms for the Agent Oracle (RMABPPO) and Nature Oracle (MA-RMABPPO) are set in the corresponding file job.run_do_counterexample.sh
. More details on the parameters can be found by running
python3 double_oracle.py --help
Note that this runs the code without the hawkins baselines since these require installation of gurobi and gurobipy. To run the baselines, first install gurobi and gurobipy (you can obtain a free academic license). Detailed instructions on how to install these are given at the bottom of the readme in the "Setup steps, Digital Ocean install" section.
then in run/job.run_do_tiny.sh
or run/job.run_do_counterexample.sh
change no_hawkins=1
to no_hawkins=0
to run the hawkins baselines.
To run a basic example of RMABPPO using the counterexample
domain from the paper, run
bash run/job.run_rmabppo_counterexample.sh
The parameter settings for RMABPPO are set in the corresponding file job.run_rmabppo_counterexample.sh
. More details on the parameters can be found by running
python3 agent_oracle.py --help
Again, this runs the code without the hawkins baseline since it requires installation of gurobi and gurobipy. To run the baselines, first install gurobi and gurobipy (you can obtain a free academic license). Detailed instructions on how to install these are given at the bottom of the readme in the "Setup steps, Digital Ocean install" section.
then in run/job.run_rmabppo_tiny.sh
or run/job.run_rmabppo_counterexample.sh
change no_hawkins=1
to no_hawkins=0
to run the hawkins baseline.
- The implementation of RMABPPO (Agent Oracle) can be found in
robust_rmab/algos/rmabppo/rmabppo_core.py
- The implementation of MA-RMABPPO (Nature Oracle) can be found in
robust_rmab/algos/ma_rmabppo/ma_rmabppo_core.py
- The double oracel portion of the codebase is currently not setup to allow multi processing from the double_oracle.py script. However, the underyling RL code was built on the SpinningUp repository which supports MPI and so the code is entangled with MPI-based function calls, meaning MPI must still be installed to run double_oracle.py. This will be addressed in future iterations.
- However, standalone RMABPPO is setup for multi-processing and can be controlled with the
--cpu
option inagent_oracle.py
Tested as of July 3, 2021
- Create a droplet on Digital Ocean using the following specifications
4 GB Memory / 2 Intel vCPUs / 80 GB Disk / NYC1 - Ubuntu 20.04 (LTS) x64
- After hitting Create Droplet, give the server a minute to spin up, then go to your terminal and ssh into the server using the ipv4 address that Digital Ocean assigned to the droplet, e.g.,
ssh -Y root@000.000.000.00
Once connecte to the server, run the following commands:
git clone https://github.com/killian-34/RobustRMAB.git
cd RobustRMAB
- Create directory structure:
bash make_dirs.sh
apt-get update
apt install python3-pip --fix-missing
sudo apt install python3-mpi4py
sudo apt-get install python3-tk
pip install -e .
- Run this command to test RMABPPO:
bash run/job.run_rmabppo_tiny.sh
- Run this command to test RMABDO:
bash run/job.run_do_tiny.sh
If you want to run Hawkins policies, e.g., no_hawkins=0
, you need to install gurobi and gurobipy by following these steps
- In a web browser Register for a Gurobi account or login at https://www.gurobi.com/downloads/end-user-license-agreement-academic/
- Navigate to https://www.gurobi.com/downloads/ and select
Gurobi Optimizer
- Review the EULA, then click
I accept the End User License Agreement
- Identify the latest installation... as of writing, it is 9.1.2, and the following commands will reflect that. However, if the latest version has changed, you can replace 9.1.2 in the following commands with the newer version number and/or links on the Gurobi website.
- Navigate to
https://www.gurobi.com/wp-content/uploads/2021/04/README_9.1.2.txt
and read the README - Back on the digial ocean server terminal:
mkdir tools
cd tools
wget https://packages.gurobi.com/9.1/gurobi9.1.2_linux64.tar.gz
tar xvzf gurobi9.1.2_linux64.tar.gz
- Add the following lines to your ~/.bashrc file, e.g., via
vim ~/.bashrc
export GUROBI_HOME="/root/tools/gurobi912/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"
- Run
source ~/.bashrc
- On the browser, navigate to https://www.gurobi.com/downloads/end-user-license-agreement-academic/
- Review the conditions, then click
I accept these conditions
- Scroll down to Installation and copy the command that looks like
grbgetkey 00000000-0000-0000-0000-000000000000
, then paste and run in the server terminal window - Enter
Y
to select the default options when prompted cd ~/RobustRMAB/
pip install gurobipy
vim run/job.run_do_tiny.sh
- Change
no_hawkins=1
tono_hawkins=0
bash run/job.run_do_tiny.sh
vim run/job.run_rmabppo_tiny.sh
- Change
no_hawkins=1
tono_hawkins=0
bash run/job.run_rmabppo_tiny.sh