oxwhirl/pymarl

Parallel_runner and episode_runner show obvious reward difference for the same test_battle_won_mean

Closed this issue · 1 comments

qyz55 commented

Hi, thank you for your amazing contribution! I'm doing some research based on qmix which may include parallel and episode runner at the same training stage. But I got reward around 18 when test_battle_won_mean reached 85% for parallel_runner, while episode_runner only produced reward around 11 for the similar test_battle_won_mean on map MMM2. Can you tell me the crucial difference between the 2 runner that produce different reward? By the way, it seems that parallel_runner's 8*sample number performs worse when trained for 2 million steps on MMM2, could you please shed some light on this? Thanks a lot!

qyz55 commented

It seems that I used an older version of smac which may calculate the enemy health twice for max reward once the first attempt to init_unit failed. When upgraded to the new smac, the problem disappeared.