Implementation for the DeDOL algorithm proposed in 'Deep Reinforcement Learning for Green Security Games with Real-Time Information', AAAI 2019.
For more details of the algorithm, please refer to the paper Deep Reinforcement Learning for Green Security Games with Real-Time Information
- Tensorflow GPU
- cvxopt
- nashpy
- the GSG-I game model class
- the main file for running the DeDOL algorithms
- helper functions for
- for loading the models trained in local modes, and then run more iterations in gloabl mode training
- helper functions for showing the game using GUI
- test the performance of trained DQNs using GUI.
- helper functions for generate different kinds of maps
- the patroller CNN strategy representation
- the poacher CNN strategy representation
- our designed heuristic parameterized random walk patroller
- our desinged heuristic random sweeping patroller
- the replay buffer data structure needed for DQN training and prioterize experience replay
- AC_patroller: the actor_critic patroller. Performs poor, not adopted in the DeDOL algorithm.
Most of the files include further detailed comments
First run for different local modes or pure global mode.
- The default training parameters should work well. You can also explore by yourself.
- To run in different local modes, change the 'po_location' parameter from 0 to 3, representing four different entering points. The code will automatically generate new directors saving DQN models trained in different local modes, for later loading in the file.
- E.g. the command 'python --row_num 5 --po_location 0 --map_type gauss' will run the DeDOL algorithm in a 5x5 grid, Mixture Gaussian Map, and the poacher will always enter the grid world from the left-top corner. The trained DQNs will be stored in the direct './Results_55_gauss_mode0/'.
- The training of DQNs could really be time-consuming in the convoluted GSG-I game. And several iterations of DeDOL would be requried to evolve a resonalbe strategy profile. Be patient :).
To collect the DQNs and run more DO iterations in global mode:
- You should first run in all local modes.
- Run Set the load_path parameter to be compatible with the save_path parameter you used in to load the previous DQNs trained in local modes. The save_path parameter should omit the last number that specifying the mode, as it will auto collect all DQNs trained in all local loads. E.g if save_path is ./Results_33_random_mode0/ to ./Results_33_random_mode3/ , the load_path should be ./Results_33_random_mode.
To visualize the game process:
- run with arg 'load' set False will visualize the behaviour of a parameterized poacher and a random sweeping patroller. You can change parameters like 'row_num', 'map_type', 'max_time' for fun.
- If you want to visualize the performance of trained DQNs, run with arg 'load' set be True, and set the corresponding 'pa_load_path' and 'po_load_path' args to the path where you stored your DQN models.
- A pretrained patroller DQN against a heuristic parameterized poacher, and a pretrained poacher DQN against a randomsweeping patroller (in 7x7 grid world) is contained in the Pre-trained_Models diretory.