This is a pytorch implementation of Structured Exploration via Deep Hierarchical Coordination.
The experimental environment is a modified version of Waterworld named MAWaterWorld_modified_mixed based on MADRL.
The main features (different from MADRL and that of MADDPG) of the modified Waterworld environment are:
- evaders and poisons now bounce at the wall obeying physical rules
- sizes of the evaders, pursuers and poisons are now the same so that random actions will lead to average rewards around 0.
- need exactly n_coop agents to catch food.
- discrete actions: up, down, left, right.
- Install MADRL.
- Replace the files in
madrl_environments/pursuit
directory with the ones in this repo. python main.py
will run the training.
The two agents need to cooperate to achieve the food for reward 10.