Simplify dense reward calculation if reward_only_positive is true

In this part of the dense reward calculation method, the damage and deaths of ally units are accumulated into variables delta_ally and delta_deathsand used to compose the reward later. Notice how dealth_deaths is only changed if self.reward_only_positive is false:

smac/smac/env/starcraft2/starcraft2.py

Lines 684 to 701 in a185b70

    
           for al_id, al_unit in self.agents.items(): 
        
               if not self.death_tracker_ally[al_id]: 
        
                   # did not die so far 
        
                   prev_health = ( 
        
                       self.previous_ally_units[al_id].health 
        
                       + self.previous_ally_units[al_id].shield 
        
                   ) 
        
                   if al_unit.health == 0: 
        
                       # just died 
        
                       self.death_tracker_ally[al_id] = 1 
        
                       if not self.reward_only_positive: 
        
                           delta_deaths -= self.reward_death_value * neg_scale 
        
                       delta_ally += prev_health * neg_scale 
        
                   else: 
        
                       # still alive 
        
                       delta_ally += neg_scale * ( 
        
                           prev_health - al_unit.health - al_unit.shield 
        
                       )

When the reward is calculated using the previous accumulated values, delta_ally is only used if self.reward_only_positive is false. The version of delta_deaths that is altered in the ally loop above is also only used if self.reward_only_positive is false.

smac/smac/env/starcraft2/starcraft2.py

Lines 716 to 719 in a185b70

    
           if self.reward_only_positive: 
        
               reward = abs(delta_enemy + delta_deaths)  # shield regeneration 
        
           else: 
        
               reward = delta_enemy + delta_deaths - delta_ally

This makes me conclude that we only need to process ally units in this method if self.reward_only_positive is false, otherwise we can ignore the first loop. I don't know how much this would affect performance (this is a method that runs on every game step, after all) but I could come up with this simplified version. I'd just like others to validate if what I said is true.

Actually, I did a quick test and got different results after changing the method, so my assumptions were wrong.

	for al_id, al_unit in self.agents.items():
	if not self.death_tracker_ally[al_id]:
	# did not die so far
	prev_health = (
	self.previous_ally_units[al_id].health
	+ self.previous_ally_units[al_id].shield
	)
	if al_unit.health == 0:
	# just died
	self.death_tracker_ally[al_id] = 1
	if not self.reward_only_positive:
	delta_deaths -= self.reward_death_value * neg_scale
	delta_ally += prev_health * neg_scale
	else:
	# still alive
	delta_ally += neg_scale * (
	prev_health - al_unit.health - al_unit.shield
	)

	if self.reward_only_positive:
	reward = abs(delta_enemy + delta_deaths) # shield regeneration
	else:
	reward = delta_enemy + delta_deaths - delta_ally