Simplify dense reward calculation if reward_only_positive is true
douglasrizzo opened this issue · 1 comments
In this part of the dense reward calculation method, the damage and deaths of ally units are accumulated into variables delta_ally
and delta_deaths
and used to compose the reward later. Notice how dealth_deaths
is only changed if self.reward_only_positive
is false:
smac/smac/env/starcraft2/starcraft2.py
Lines 684 to 701 in a185b70
When the reward is calculated using the previous accumulated values, delta_ally
is only used if self.reward_only_positive
is false. The version of delta_deaths
that is altered in the ally loop above is also only used if self.reward_only_positive
is false.
smac/smac/env/starcraft2/starcraft2.py
Lines 716 to 719 in a185b70
This makes me conclude that we only need to process ally units in this method if self.reward_only_positive
is false, otherwise we can ignore the first loop. I don't know how much this would affect performance (this is a method that runs on every game step, after all) but I could come up with this simplified version. I'd just like others to validate if what I said is true.
Actually, I did a quick test and got different results after changing the method, so my assumptions were wrong.