This repository is the tensorflow implementations of the paper DPIQN: Deep Policy Inference Q-Network.
- Python 3.4+
- pygame-soccer
- tensorflow 1.2.1
- tensorpack
Simply enter the following command to train a DPIQN agent:
python src/train_dpiqn.py
The following arguments can help you customize your own training arguments:
--gpu comma separated list of GPU(s) to use.
--load load model
--log train log dir
--task task to perform {play, eval, train}
--algo algorithm for computing Q-value {DQN, Double, Dueling}
--mode specify ai mode in env (can be list) {offensive, defensive}
--mt_mode multi-task setting {coop-only,opponent-only,all}
--mt use 2v2 env
--skip act repeat
--hist_len hist len
--batch_size batch size (default: 32)
--lr init lr value (default: 1e-3)
--rnn use rnn (DRPIQN)
--lr_sched lr schedule (default: 600:4e-4,1000:2e-4)
--eps_sched eps decay schedule (default: 100:0.1,3200:0.01)
--reg reg
For example, if you run the following command:
python src/train_dpiqn.py --gpu=1 --mt --mt_mode=coop-only --eps_sched='100:0.1,3200:0.01'
Then it will start training a DPIQN model in 2 vs. 2 soccer game, and it will only infer its coolaborator's policy. Besides, the eps parameter for epsilon-greedy will decrease to 0.1 at epoch 100, and down to 0.01 at epochj 3200.
To test the model, enter the command:
python src/train_dpiqn.py --load=[path_to_model] --task=eval
The model will be evaluated for 100,000 episodes. In addition, you can use the following command to watch how your agent play:
python src/train_dpiqn.py --load=[path_to_model] --task=play
Note that you can also use the same optional arguments listed in Training section.