Denys88/rl_games

Implementation and performance questions

mmcaulif opened this issue · 1 comments

I just have some questions about your implementation of MAPPO.

  1. Are there any other major changes in your implementation other than the use of convolutional networks? I know the paper used RNN's, n-step learning and parallel environments.
  2. Have you compared the performance of your selected hyperparameters on other MAPPO implementations?

Just wondering as the solving of the simple environments in ~150k timesteps is significantly better than anything I could achieve in my research with tuning MAPPO so hoping for some tips/pointers :)

Hi, @mmcaulif my baseline paper was implemented long time ago using tensorflow https://github.com/Denys88/rl_games/tree/0871084d8d95954fa165dbe93eadb54773b7a36a
Main feature that I just stacked 4 frames and used conv1d.

I have a lot of different ppo experiments including central value and lstm on pytorch but there are cases where my old implementationor MAPPO paper is better.

In the mappo they made pretty interesting improvements which I didn't implement in my repo: global state tuning and death masking.

Overall this sc2 benchmark is pretty strange and might depend a lot on initial action distribution. for example if moving left for all unit has highest probability on untrained neural network it might make training much faster.