Parallel_runner and episode_runner show obvious reward difference for the same test_battle_won_mean

Question

Parallel_runner and episode_runner show obvious reward difference for the same test_battle_won_mean

Closed this issue 4 years ago · 1 comments

Hi, thank you for your amazing contribution! I'm doing some research based on qmix which may include parallel and episode runner at the same training stage. But I got reward around 18 when test_battle_won_mean reached 85% for parallel_runner, while episode_runner only produced reward around 11 for the similar test_battle_won_mean on map MMM2. Can you tell me the crucial difference between the 2 runner that produce different reward? By the way, it seems that parallel_runner's 8*sample number performs worse when trained for 2 million steps on MMM2, could you please shed some light on this? Thanks a lot!

Answer 1 · 2020-11-18T08:53:57.000Z

It seems that I used an older version of smac which may calculate the enemy health twice for max reward once the first attempt to init_unit failed. When upgraded to the new smac, the problem disappeared.