Calculating Reward on Game End
Closed this issue · 5 comments
If you look at the stream of rewards below (entire Game #2) you will see that it ends in victory for Dire, however you only see 1 death each from both agents. Also, based on tower_hp
it looks like the tower was not even close to dying, meaning the game ended b/c the Radiant agent died a 2nd time, but I don't have the -3.0 kill reward for Player 0 in the reward a second time.
This make me believe that we don't capture the rewards between the last reward sync and the game end.
2019-02-05 10:59:50,789 INFO === Starting Game 2.
2019-02-05 10:59:50,789 INFO Starting game.
2019-02-05 10:59:50,797 INFO Player 0 using weights version 0
2019-02-05 10:59:50,802 INFO Player 5 using weights version 0
2019-02-05 11:00:16,411 INFO Player 0 rollout.
2019-02-05 11:00:16,412 INFO Player 0 reward sum: -0.11 subrewards:
{'death': -0.0,
'denies': 0.0,
'enemy': -0.114,
'hp': 0.0,
'kills': 0.0,
'lh': 0.0,
'tower_hp': 0.0,
'win': 0.0,
'xp': 0.0}
2019-02-05 11:00:16,429 INFO Player 5 rollout.
2019-02-05 11:00:16,430 INFO Player 5 reward sum: 0.11 subrewards:
{'death': -0.0,
'denies': 0.0,
'enemy': -0.0,
'hp': 0.0,
'kills': 0.0,
'lh': 0.0,
'tower_hp': 0.0,
'win': 0.0,
'xp': 0.114}
2019-02-05 11:00:33,551 INFO Received new model: version=0, size=1472372b
2019-02-05 11:00:40,146 INFO Player 0 rollout.
2019-02-05 11:00:40,147 INFO Player 0 reward sum: -0.15 subrewards:
{'death': -3.0,
'denies': 0.0,
'enemy': 3.0988716954415696,
'hp': -1.2002411301619431,
'kills': 0.0,
'lh': 0.0,
'tower_hp': -0.015,
'win': 0.0,
'xp': 0.9700000000000001}
2019-02-05 11:00:40,158 INFO Player 5 rollout.
2019-02-05 11:00:40,159 INFO Player 5 reward sum: 0.15 subrewards:
{'death': -3.0,
'denies': 0.2,
'enemy': 3.245241130161943,
'hp': -1.2005383621082364,
'kills': 0.0,
'lh': 0.0,
'tower_hp': -0.058333333333333334,
'win': 0.0,
'xp': 0.96}
2019-02-05 11:00:56,220 INFO Player 0 rollout.
2019-02-05 11:00:56,221 INFO Player 0 reward sum: -6.98 subrewards:
{'death': -0.0,
'denies': 0.0,
'enemy': -0.61683011154303,
'hp': -1.3583716176202625,
'kills': 0.0,
'lh': 0.0,
'tower_hp': 0.0,
'win': -5.0,
'xp': 0.0}
2019-02-05 11:00:56,226 INFO Player 5 rollout.
2019-02-05 11:00:56,227 INFO Player 5 reward sum: 6.72 subrewards:
{'death': -0.0,
'denies': 0.0,
'enemy': 1.3602503028054476,
'hp': -0.3734285740740741,
'kills': 0.0,
'lh': 0.0,
'tower_hp': -0.018333333333333333,
'win': 5.0,
'xp': 0.756}
2019-02-05 11:00:56,232 INFO Game finished.
Both players died here, during the same rollout. Then one player won at the last rollout. The rewards are aggregated only per-rollout.
It doesn't matter how he won (death or tower), a win might be because of a death or tower, but that's not scored independently - no need to.
But a single player needs to die twice for game to be over. They each died once according to record, so game should not be over. What I believe happened is that a player died a 2nd time and that this info was not captured in our reward aggregation.
but doesn't the bot that died a 2nd time not get the negative reward from the 2nd death? Sure, it "loses" and gets the -5 but it won't necessarily make the connection that 2nd death is the cause as it doesn't see the 2nd death in the rewards.
or am I misunderstanding something about our algo?