Why is the performance of GRU+PPO poor?

Question

Why is the performance of GRU+PPO poor?

Closed this issue 3 years ago · 4 comments

The performance of your project is shocking. I want to know why CNN+PPO can be so excellent.
Thank you!!!!!!

Answer 1 · 2020-11-01T18:11:12.000Z

hi @hijkzzz the most weird part that tensorflow performs much better on MMM2 with conv1d than pytorch.
If you take a look at DM/sc2_runs you can see that tenosrflow:
'python runner.py --train --file rl_games/configs/smac/MMM2.yaml --tf' trains in a few millions of runs but
'python runner.py --train --file rl_games/configs/smac/MMM2_torch.yaml' trains much slover. But I even reproduced truncated normal initializer and still have the same issues.

But I've found that simple MLP is enough to solve most of the envs.
You can find configs in 'rl_games/configs/smac/runs'.
I'll merge DM/sc2_runs in afew days. If you got any issues in current branch they gonna be fixed here.

Answer 2 · 2020-11-02T05:33:24.000Z

hi @hijkzzz the most weird part that tensorflow performs much better on MMM2 with conv1d than pytorch.
If you take a look at DM/sc2_runs you can see that tenosrflow:
'python runner.py --train --file rl_games/configs/smac/MMM2.yaml --tf' trains in a few millions of runs but
'python runner.py --train --file rl_games/configs/smac/MMM2_torch.yaml' trains much slover. But I even reproduced truncated normal initializer and still have the same issues.

But I've found that simple MLP is enough to solve most of the envs.
You can find configs in 'rl_games/configs/smac/runs'.
I'll merge DM/sc2_runs in afew days. If you got any issues in current branch they gonna be fixed here.

Thank you, I test RNN+PPO in SMAC, but get poor performance, so I am particularly curious about which tricks make MLP or CNN+ppo work.

Answer 3 · 2020-11-21T23:41:49.000Z

@hijkzzz I've found a few bugs some time ago and fixed them. Going to rerun sum rnn experiments.

Answer 4 · 2021-01-23T18:45:13.000Z

@hijkzzz I've found a few more bugs in the RNN implementation, now it works much better.